<TeXmacs|2.1.4>

<style|<tuple|generic|old-lengths>>

<\body>
  <doc-data|<doc-title|Numerical computations>>

  <em|Reference>:\ 

  0. H.Abelson, G.Sussman and J.Sussman, <em|Structure and Interpretation of
  Computer Programs>.

  1. \<#6797\>\<#6210\>\<#68EE\>, <em|\<#6570\>\<#503C\>\<#8BA1\>\<#7B97\>\<#65B9\>\<#6CD5\>>.

  2. R.Burden and J.Faires, <em|Numerical analysis>.

  3. \<#8303\>\<#91D1\>\<#71D5\> and \<#8881\>\<#4E9A\>\<#6E58\>,
  <em|\<#975E\>\<#7EBF\>\<#6027\>\<#65B9\>\<#7A0B\>\<#7EC4\>\<#6570\>\<#503C\>\<#89E3\>\<#6CD5\>>.

  4. \<#9AD8\>\<#7ACB\>, <em|\<#6570\>\<#503C\>\<#6700\>\<#4F18\>\<#5316\>\<#65B9\>\<#6CD5\>>.

  5. J.Nocedal and S.Wright, <em|Numerical Optimization>.

  6. Yuan and Stoer, <em|A subspace study on conjugate gradient algorithms>,
  Math. Mech. 75 (1995) 11, 69-77

  <\table-of-contents|toc>
    <vspace*|1fn><with|font-series|bold|math-font-series|bold|1<space|2spc>Introduction>
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-1><vspace|0.5fn>

    <vspace*|1fn><with|font-series|bold|math-font-series|bold|2<space|2spc>Optimization
    problems> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-2><vspace|0.5fn>

    <with|par-left|1tab|2.1<space|2spc>Unconstrained optimization problem
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-3>>

    <with|par-left|1tab|2.2<space|2spc>Line search approach
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-4>>

    <with|par-left|2tab|2.2.1<space|2spc>Line search algorithm
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-5>>

    <with|par-left|2tab|2.2.2<space|2spc>Different methods to descent
    direction <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-7>>

    <with|par-left|1tab|2.3<space|2spc>Trust region approach
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-9>>

    <with|par-left|2tab|2.3.1<space|2spc>Newton, Quasi-Newton, LMF and CG
    methods <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-10>>

    <with|par-left|2tab|2.3.2<space|2spc>Subspace method
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-11>>

    <with|par-left|1tab|2.4<space|2spc>Strict-inequality constrained
    optimization problem <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-13>>

    <with|par-left|1tab|2.5<space|2spc>Constrained optimization problem
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-14>>

    <with|par-left|2tab|2.5.1<space|2spc>penalty function method
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-15>>

    <with|par-left|2tab|2.5.2<space|2spc>Augmented Lagrange penalty function
    method <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-16>>

    <vspace*|1fn><with|font-series|bold|math-font-series|bold|3<space|2spc>Algebraic
    equations> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-17><vspace|0.5fn>

    <with|par-left|1tab|3.1<space|2spc>Single variable eqn
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-18>>

    <with|par-left|2tab|3.1.1<space|2spc>Bisection method
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-19>>

    <with|par-left|2tab|3.1.2<space|2spc>Iteration-function method
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-20>>

    <with|par-left|2tab|3.1.3<space|2spc>Mixed method
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-21>>

    <with|par-left|2tab|3.1.4<space|2spc>Accelerating iteration: Aitken
    method and it's super version <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-22>>

    <with|par-left|1tab|3.2<space|2spc>Linear eqns
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-23>>

    <with|par-left|2tab|3.2.1<space|2spc>Direct methods
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-24>>

    <with|par-left|2tab|3.2.2<space|2spc>Iterative methods
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-25>>

    <with|par-left|2tab|3.2.3<space|2spc>Jacobi method
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-26>>

    <with|par-left|2tab|3.2.4<space|2spc>Gauss-Siedel method
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-27>>

    <with|par-left|2tab|3.2.5<space|2spc>Successive-Over-Relaxation(SOR)
    method <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-28>>

    <with|par-left|2tab|3.2.6<space|2spc>Aitken method and its super version
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-29>>

    <with|par-left|2tab|3.2.7<space|2spc>The convergency problem
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-30>>

    <with|par-left|1tab|3.3<space|2spc>Non-linear eqns
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-31>>

    <with|par-left|2tab|3.3.1<space|2spc>Newton method
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-32>>

    <with|par-left|2tab|3.3.2<space|2spc>Newton-Sceant method
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-33>>

    <with|par-left|2tab|3.3.3<space|2spc>Quasi-Newton method
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-34>>

    <with|par-left|2tab|3.3.4<space|2spc>Lenvenberg-Marquardt method
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-35>>

    <vspace*|1fn><with|font-series|bold|math-font-series|bold|4<space|2spc>Ordinary
    differential eqns: initial-value problems>
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-36><vspace|0.5fn>

    <with|par-left|1tab|4.1<space|2spc>One-step methods: Runge-Kutta method
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-37>>

    <with|par-left|1tab|4.2<space|2spc>Multi-steps methods: Adams methods
    <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-38>>

    <vspace*|1fn><with|font-series|bold|math-font-series|bold|5<space|2spc>Monte
    Carlo methods> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
    <no-break><pageref|auto-39><vspace|0.5fn>
  </table-of-contents>

  <section|Introduction>

  When studying some scientific problems, we are often faced with some
  practical problems, such as <em|finding the minimum of a function>,
  <em|solving (algebraic or differential) equations>, <em|doing
  integrations>, which belong to the subject of <em|applied mathematics>.
  These problems are often hard to solve analyticaly, thus some numerical
  methods are needed. This note is to give an introduction to solving the
  problems, and the algrorithms are described by the language
  <verbatim|Scheme>(1975), a Lisp(1958) dialect, which is quite appropriate
  for describing/addressing these problems.

  <section|Optimization problems>

  <subsection|Unconstrained optimization problem>

  <strong|Problem:> A nonlinear unconstrained optimization problem is to find
  <math|x> to

  <\equation>
    min<around*|{|f<around*|(|x|)><around*|\||x\<in\>\<bbb-R\><rsup|n>|\<nobracket\>>|}><label|optimization-problem>.
  </equation>

  The method to solve the problem is generally to find an iterative sequence
  <math|<around*|{|x<rsub|0>,x<rsub|1>,\<cdots\>,x<rsub|n>,\<cdots\>|}>>
  whose limit is the solution: <math|x=lim<rsub|n\<rightarrow\>\<infty\>>x<rsub|n>>,
  which means <math|x<rsub|n>> can approximate the solution when <math|n> is
  large enough. The iteration relation often takes as\ 

  <\equation>
    x<rsub|k+1>=x<rsub|k>+s<rsub|k>d<rsub|k><label|iteration-relation-of-optimization>,
  </equation>

  where the <strong|direction> <math|d<rsub|k>> satisfing the <em|descent
  condition>,

  <\equation>
    d<rsub|k>\<cdot\>\<nabla\>f<around*|(|x<rsub|k>|)>\<less\>0<label|descent-condtion>,
  </equation>

  and <strong|step size> <math|s<rsub|k>> is <em|positive>
  <math|s<rsub|k>\<gtr\>0>.

  There two main strategies to give <math|s<rsub|k>> and <math|d<rsub|k>>:
  <strong|line search stragety> and <strong|trust region strategy>.

  The line search strategy is first to find a descent direction
  <math|d<rsub|k>>, then try to find an appropriate step <math|s<rsub|k>>;
  While the trust region strategy is to fix a closed region
  <math|\<Delta\><rsub|k>>, and find the best(at least allowed) step
  <math|s<rsub|k>> and direction <math|d<rsub|k>> in this region:
  <math|x<rsub|k>+s<rsub|k>d<rsub|k>\<in\>\<Delta\><rsub|k>>.

  We first give line-search stragety, then consider trust region strategy.

  <subsection|Line search approach>

  <subsubsection|Line search algorithm>

  Since the descent directions varies for different method but the
  line-search frameworks are almost the same, we first consider the step size
  <math|s<rsub|k>>, given a descent direction <math|d<rsub|k>>.

  Although when given direction <math|d<rsub|k>>, to solve <math|s<rsub|k>>
  in

  <\equation*>
    min <around*|{|s<rsub|k>\<in\>\<bbb-R\><rsub|+><around*|\||f<around*|(|x<rsub|k>+s<rsub|k>d<rsub|k>|)>|\<nobracket\>>|}>
  </equation*>

  is a one-dimentional problem, it is hard to solve <math|s<rsub|k>> exactly,
  which means we need introduce some conditions to say when one
  <math|s<rsub|k>> is good enough and stop the calculation in this direction.
  We give two mainly used conditions: <em|Armijo condition> and <em|curvature
  condition>:

  <\definition>
    [<strong|Armijo condition> and <strong|curvature condition>]

    <\equation>
      f<around*|(|x<rsub|k>+s d<rsub|k>|)>\<leqslant\>f<around*|(|x<rsub|k>|)>+c<rsub|1>s\<nabla\>f<rsub|k>\<cdot\>d<rsub|k><label|armijo-condition>,
    </equation>

    <\equation>
      <around*|\||\<nabla\>f<around*|(|x<rsub|k>+s
      d<rsub|k>|)>\<cdot\>d<rsub|k>|\|>\<leqslant\>c<rsub|2><around*|\||\<nabla\>f<rsub|k>\<cdot\>d<rsub|k>|\|><label|curvature-condition>,
    </equation>

    where <math|c<rsub|1>,c<rsub|2>> are satisfing

    <\equation*>
      0\<less\>c<rsub|1>\<less\>c<rsub|2>\<less\>1.
    </equation*>

    Note that the <strong|strong-Wolfe-condition> is just to combine above
    two conditions.

    Note: Armijo condition makes that there exists a finite
    <math|s<rsub|\<ast\>>> such that <math|s<rsub|\<ast\>>> does not satisfy
    Armijo-condition which means we will focus on region
    <math|<around*|[|0,s<rsub|\<ast\>>|]>>.
  </definition>

  The line-search algorithm is also an iterative method, i.e. getting an
  increasing seq {<math|s<rsub|0>=0,s<rsub|1>,\<cdots\>,s<rsub|i>,s<rsub|i+1>,\<cdots\>>},
  in which some item <math|s<rsub|\<ast\>>> begins to satisfy the Armijo
  condition and curvature. We first simplify the problem as one-dimentional
  functions problem, i.e., define

  <\equation*>
    y<around*|(|s|)>\<assign\>f<around*|(|x<rsub|k>+s d<rsub|k>|)>,
  </equation*>

  and

  <\equation*>
    y<rprime|'><around*|(|s|)>=\<nabla\>f<around*|(|x<rsub|k>+s
    d<rsub|k>|)>\<cdot\>d<rsub|k>.
  </equation*>

  Thus the conditions are translated into the following conditons

  <\eqnarray>
    <tformat|<table|<row|<cell|y<around*|(|s|)>>|<cell|\<leqslant\>>|<cell|y<around*|(|0|)>+c<rsub|1>s
    y<rprime|'><around*|(|0|)>,>>|<row|<cell|<around*|\||y<rprime|'><around*|(|s|)>|\|>>|<cell|\<leqslant\>>|<cell|c<rsub|2><around*|\||y<rprime|'><around*|(|0|)>|\|>,>>>>
  </eqnarray>

  with <math|0\<less\>c<rsub|1>\<less\>c<rsub|2>\<less\>1>.

  Since we hope <math|s<rsub|i+1>> is better than <math|s<rsub|i>>, we also
  require

  <\equation*>
    y<around*|(|s<rsub|i+1>|)>\<leqslant\>y<around*|(|s<rsub|i>|)>.
  </equation*>

  Thus we get the final conditions to the iterative seq
  <math|<around*|{|s<rsub|0>=0,s<rsub|1>,\<cdots\>|}>>:

  (a) Sufficient-Descent-Condition(sdc):

  <\equation>
    <choice|<tformat|<table|<row|<cell|y<around*|(|s|)>\<leqslant\>y<around*|(|0|)>+c<rsub|1>s
    y<rprime|'><around*|(|0|)>,>>|<row|<cell|y<around*|(|s|)>\<leqslant\>y<around*|(|s<rsub|i>|)>,>>>>>i=1,2,\<cdots\><label|sdc>.
  </equation>

  (b) Curvature-Condition(cc):

  <\equation>
    <around*|\||y<rprime|'><around*|(|s|)>|\|>\<leqslant\>c<rsub|2><around*|\||y<rprime|'><around*|(|0|)>|\|><label|cc>.
  </equation>

  The picture below shows the meaning of sdc and cc.

  <\big-figure|<with|gr-mode|<tuple|edit|math-at>|gr-frame|<tuple|scale|1cm|<tuple|0.5gw|0.5gh>>|gr-geometry|<tuple|geometry|1par|0.6par>|gr-grid|<tuple|empty>|gr-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-edit-grid-aspect|<tuple|<tuple|axes|none>|<tuple|1|none>|<tuple|10|none>>|gr-edit-grid|<tuple|empty>|gr-edit-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-arrow-end|\<gtr\>|gr-auto-crop|true|<graphics||<with|arrow-end|\<gtr\>|<line|<point|-3|2>|<point|6.0|2.0>>>|<with|dash-style|10|line-width|2ln|<spline|<point|-3|2>|<point|6.0|-1.3>>>|<spline|<point|-3|2>|<point|-2.2|0.0>|<point|-1.5|-1.0>|<point|-0.6|-0.5>|<point|1.0|1.4>|<point|2.0|-1.0>|<point|3.0|-2.3>|<point|4.4|-1.0>|<point|5.7|1.3>>|<with|fill-color|dark
  grey|dash-style|10|line-width|2ln|<cline|<point|-1.77446|-0.840628>|<point|-0.826795863875585|-0.756010862203049>|<point|-1.05192482762799|-0.937151546190836>|<point|-1.27038935740619|-1.00420979134943>|<point|-1.5|-1.0>>>|<with|fill-color|dark
  grey|dash-style|10|line-width|2ln|<cline|<point|2.70399|-2.16489>|<point|3.51634287484896|-2.13733868090028>|<point|3.356779619797|-2.236240796522>|<point|3.00000000000001|-2.29999999999999>>>|<line|<point|-2.6|0.972956>|<point|6.0|1.0>>|<with|opacity|40%|fill-color|dark
  cyan|<cline|<point|-2.59461|0.95939>|<point|-0.219387804298274|0.980442194909367>|<point|0.270373419190661|0.80086307963009>|<point|-0.17548847478443|0.0907309316276841>|<point|-0.818612261388542|-0.7478156579638>|<point|-1.26780196252372|-1.003824454608>|<point|-1.77224181188479|-0.843013503415153>|<point|-2.2|-8.88178419700125e-16>>>|<with|opacity|40%|fill-color|dark
  cyan|<cline|<point|1.59367|0.315653>|<point|1.99999999999999|-1.00000000000001>|<point|2.28004813924113|-1.61681547554277>|<point|2.66284400190881|-2.13093592030098>|<point|3.2152333499942|-2.28876949486246>|<point|3.68227225593638|-1.98947298064476>|<point|4.06233471347244|-1.51142471701807>|<point|4.54577105779716|-0.766782721192294>>>|<math-at|s<rsub|i>|<point|-3.0|0.936126863544746>>|<math-at|s<rsub|i+1>|<point|6.09705318163778|1.6335659478767>>|<with|arrow-end|\<gtr\>|<line|<point|-3|-2>|<point|-3.0|3.0>>>|<math-at|s<rsub|0>=0|<point|-3.91485315517926|1.90873462098161>>|<math-at|sdc|<point|-1.43341|0.0482372>>|<math-at|sdc|<point|2.82112|-0.988937>>|<math-at|cc|<point|0.662108083079772|-1.36993980685276>>|<with|arrow-end|\<gtr\>|<line|<point|0.544549543590422|-1.10054239978833>|<point|-1.08835679887935|-0.955896852759133>>>|<with|arrow-end|\<gtr\>|<line|<point|1.36061317634608|-1.32760616483662>|<point|3.00000000000001|-2.29999999999999>>>|<math-at|loop|<point|-2.76691692022754|-0.201779335890991>>|<math-at|loop|<point|1.46645|-0.942618>>|<math-at|zoom|<point|3.92030002266349|-1.70602211789526>>|<math-at|zoom|<point|-0.480900251355999|-0.688616219076597>>|<math-at|zoom|<point|0.513940336023284|1.44923270273846>>|<math-at|y<around*|(|s|)>|<point|-3.77073686995634|2.76306389734092>>>>>
    Region for <math|s<rsub|i+1>>: sdc(light-green) and cc(deep-green).
  </big-figure>

  <\scm-code>
    ;;Sufficient-Descent-Condition

    (define (sdc c1 s)

    \ \ (lambda (s0 ys ys0 y0 dy0)

    \ \ \ \ (and (\<less\>= ys (+ y0 (* c1 s dy0)))

    \ \ \ \ \ \ \ \ \ (\<less\>= ys ys0))))

    ;;Cuvature-Condition

    (define (cc c2 s)

    \ \ (lambda (dys dy0)

    \ \ \ \ (\<less\>= (abs dys) (* c2 (abs dy0)))))
  </scm-code>

  Now we can give the line-search-algorithm, as follows.

  <\equation*>
    \ <around*|(|loop s<rsub|0>=0 \ s<rsub|1>|)>:<around*|(|sdc?
    s<rsub|1>|)>\<rightarrow\><block|<tformat|<table|<row|<cell|yes:<around*|(|cc?
    s<rsub|1>|)>\<rightarrow\><block|<tformat|<table|<row|<cell|yes\<Rightarrow\>s<rsub|1>>>|<row|<cell|no:<around*|(|y<rprime|'><around*|(|s1|)>\<gtr\>0|)>\<rightarrow\><block|<tformat|<table|<row|<cell|yes\<Rightarrow\><around*|(|loop
    s<rsub|1> new<rsub|->s|)>>>|<row|<cell|no\<Rightarrow\><around*|(|zoom
    s<rsub|1> s<rsub|0>|)>>>>>>>>>>>>>|<row|<cell|no\<Rightarrow\><around*|(|zoom
    s<rsub|0> s<rsub|1>|)>>>>>>
  </equation*>

  <\scm-code>
    ;;---- Line-Search-Algorithm

    (define (line-search-algorithm dy c1 c2)

    \ \ ;; require 0\<less\>c1\<less\>c2\<less\>1

    \ \ (let loop ([s0 0] [s1 (choose-one-step 0)]);choose-one-step

    \ \ \ \ \ \ \ (if (~sdc~ s1)

    \ \ \ \ \ \ \ \ \ \ \ (if (~cc~ s1) s1 ;return

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (if (positive? (dy s1))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (zoom s1 s0) ;

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (loop s1 (choose-one-alpha s1))))

    \ \ \ \ \ \ \ \ \ \ \ (zoom s0 s1)))) ;

    ;1. ~sdc~ and ~cc~ would be defined by above functions sdc,cc under
    fixing parameters;\ 

    ;2. (choose-one-step s) returns a new step larger than s;

    ;3. zoom is defined in next algorithm;

    ;;note: choose-one-step is a problem: new step should not quite large,
    but too small would be expensive!
  </scm-code>

  In above algorithm, the <verbatim|zoom> function on
  <math|<around*|(|s<rsub|c>,s<rsub|b>|)>> helps to generating the
  <math|s\<equiv\>s<rsub|z>> satisfing strong-Wolfe-condition between
  <math|s<rsub|c>> and <math|s<rsub|b>>, with conditions:

  <\enumerate-alpha>
    <item>There exist an <math|s> between <math|s<rsub|c>> and
    <math|s<rsub|b>> satisfing strong-Wolfe-condition;

    <item><math|s<rsub|c>> is satisfing sdc but not cc;

    <item><math|s<rsub|b>> is satisfing <math|y<rprime|'><around*|(|s<rsub|c>|)><around*|(|s<rsub|b>-s<rsub|c>|)>\<less\>0>.
  </enumerate-alpha>

  <\note>
    Please check that in previous line-search-algorithm, the callings of
    <verbatim|zoom> satisfy above three conditions. [ a) should be checked
    carefully! ]
  </note>

  <\scm-code>
    ;;---- zoom

    (define (zoom sc sb)

    \ \ (let ([s (try-one-step sc sb)]);try-one-alpha

    \ \ \ \ (if (~sdc~ s)

    \ \ \ \ \ \ \ \ (if (~cc~ s) s ;return

    \ \ \ \ \ \ \ \ \ \ \ \ (zoom s (if (negative? (* (dy s) (- sb sc)))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ sb

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ sc)))

    \ \ \ \ \ \ \ \ (zoom sc s)))

    ;try-one-step returns an step between sc and sb;

    ;one can use (/ (+ sa sb) 2) or quadratic function given by yc,yb,dyc.
  </scm-code>

  <\note>
    Please check that all the callings on <verbatim|zoom>, the conditions of
    <math|s<rsub|c>,s<rsub|b>> are always satisfied.

    [ a) should be checked carefully! ]
  </note>

  <subsubsection|Different methods to descent direction>

  (1) The simplest method is the Steepest Gradient method(<strong|SD>) whose
  direction is the descent direction at <math|x<rsub|k>> is the inverse
  gradient:

  <\equation>
    d<rsub|k><rsup|SD>=-grad<around*|(|x<rsub|k>|)>\<equiv\>-g<rsub|k><label|SD>.
  </equation>

  (2) The second method is the Newton method(<strong|Newton>) which is found
  as follows.

  Consider when <math|x<rsub|k>> is near the solution <math|x<rsub|\<ast\>>>,
  then by Taylor expansion at <math|x<rsub|k>>,

  <\equation*>
    f<around*|(|x|)>=f<around*|(|x<rsub|k>+\<delta\>|)>=f<rsub|k>+g<rsub|k>\<cdot\>\<delta\>+<frac|1|2>\<delta\><rsup|T>G\<delta\>+o<around*|(|<around*|\<\|\|\>|\<delta\>|\<\|\|\>><rsup|2>|)>,
  </equation*>

  then the minimun of <math|f<rsub|k>+g<rsub|k>\<cdot\>\<delta\>+<frac|1|2>\<delta\><rsup|T>G\<delta\>>
  takes at <math|G\<delta\>=-g<rsub|k>>, which means we can use

  <\equation>
    d<rsub|k><rsup|Newton>=-G<rsup|-1>g<rsub|k><label|Newton>,
  </equation>

  and <math|s<rsub|k>=1>.

  <\note>
    Newton may be failed when <math|G> is not inversible.
  </note>

  (3) Since the Hessen matrix <math|G=\<partial\><rsub|x<rsup|i>x<rsup|j>>f<around*|(|x<rsub|k>|)>>
  is hard to get or is space-expensive, one want to get some approximation to
  <math|G> or <math|G<rsup|-1>>. One class of these methods is called
  Quasi-Newton method(<strong|Quasi-Newton>).

  xxxxx to be completed xxxxx

  \;

  (4) A new class of methods is called Subspace-method(<strong|Subspace>),
  including conjugate-gradient methods(<strong|CG>), whose idea is to reduce
  the minimum problem to a lower dimensional problem.

  <\note>
    Subspace method may be better than Newton or scant-Newton method, because
    it can solve reduced Hessian matrix <math|<wide|G|^>> by hand, and in
    sometimes full Hessian matrix <math|G> is inversible but reduced one is
    not.
  </note>

  Below we give some methods, which are inspired by Newton method, but only
  need two points' information, i.e., previous point and current point:
  \ <math|<around*|(|x<rsub|0>,f<rsub|0>,g<rsub|0>\<equiv\>grad<rsub|0>|)>,<around*|(|x<rsub|1>,f<rsub|1>,g<rsub|1>\<equiv\>grad<rsub|1>|)>>,
  and gives <math|x<rsub|2>=x<rsub|1>+d<rsub|1>>.

  <\example>
    [this method needs a new point <math|<wide|x|~>>]

    Suppose that we have got <math|x<rsub|0>> and
    <math|x<rsub|1>=x<rsub|0>+d<rsub|0>>, try to get
    <math|x<rsub|2>=x<rsub|1>+d<rsub|1>>, where
    <math|d<rsub|1>\<in\>span<around*|{|-g<rsub|1>,d<rsub|0>|}>>, i.e., let
    <math|<wide|g|^><rsub|1>=g<rsub|1>/<around*|\||g<rsub|1>|\|>> and
    <math|<wide|d|^><rsub|0>=d<rsub|0>/<around*|\||d<rsub|0>|\|>>, and
    consider

    <\equation*>
      x=x<rsub|1>-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>,
    </equation*>

    and try to find best <math|\<mu\>,\<nu\>>, which is shown in the
    following lemma.

    <\lemma*>
      (1) Let <math|G> be the Hessian matrix of <math|f<around*|(|x|)>> at
      <math|x<rsub|1>>, and assume <math|G> is
      <strong|positive-definite<verbatim|>>, then to second order, the
      minimun of <math|m<around*|(|s,\<beta\>|)>=f<around*|(|x<rsub|1>+s<around*|(|-<wide|g|^><rsub|1>+\<beta\><wide|d|^><rsub|0>|)>|)>>
      takes at

      <\eqnarray>
        <tformat|<table|<row|<cell|\<beta\>>|<cell|=>|<cell|<frac|<around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>>-<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1><around*|\<langle\>|<wide|g|^><rsub|1>|\<rangle\>><rsup|2>|<around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>-<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1><around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>>>,>>|<row|<cell|s>|<cell|=>|<cell|<around*|\||g<rsub|1>|\|><frac|1-\<beta\><wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1>|<around*|\<langle\>|<wide|g|^><rsub|1>|\<rangle\>><rsup|2>-2\<beta\><around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>>+\<beta\><rsup|2><around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>>,>>>>
      </eqnarray>

      where

      <\equation*>
        <around*|\<langle\>|\<xi\>,\<eta\>|\<rangle\>>\<assign\>\<xi\><rsup|T>G\<eta\>,<around*|\<langle\>|\<xi\>|\<rangle\>><rsup|2>\<equiv\><around*|\<langle\>|\<xi\>,\<xi\>|\<rangle\>>.
      </equation*>

      (2) To avoid of the Hessian matrix, we can approximate
      <math|<around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>,<around*|\<langle\>|<wide|d|^><rsub|0>,g<rsub|1>|\<rangle\>>,<around*|\<langle\>|<wide|g|^><rsub|1>|\<rangle\>><rsup|2>>
      as follows,

      <\eqnarray>
        <tformat|<table|<row|<cell|<around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>>|<cell|=>|<cell|<frac|<around*|\||g<rsub|1>|\|>|<around*|\||d<rsub|0>|\|>><around*|(|<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1>-<frac|<around*|\||g<rsub|0>|\|>|<around*|\||g<rsub|1>|\|>><wide|d<rsub|0>|^>\<cdot\><wide|g|^><rsub|0>|)>,>>|<row|<cell|<around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>>>|<cell|=>|<cell|<frac|<around*|\||g<rsub|1>|\|>|<around*|\||d<rsub|0>|\|>><around*|(|1-<frac|<around*|\||g<rsub|0>|\|>|<around*|\||g<rsub|1>|\|>><wide|g|^><rsub|0>\<cdot\><wide|g|^><rsub|1>|)>,>>|<row|<cell|<around*|\<langle\>|<wide|g|^><rsub|1>|\<rangle\>><rsup|2>>|<cell|=>|<cell|<frac|<around*|\||g<rsub|1>|\|>|d<rsub|min>><around*|(|1-<frac|<around*|\||<wide|g|~>|\|>|<around*|\||g<rsub|1>|\|>><wide|<wide|g|~>|^>\<cdot\><wide|g|^><rsub|1>|)>,<wide|x|~>=x<rsub|1>+<wide|s|~><around*|(|-<wide|g|^><rsub|1>|)>,
        <wide|s|~>\<assign\>d<rsub|min>.>>>>
      </eqnarray>
    </lemma*>

    <\big-figure|<with|gr-mode|<tuple|edit|math-at>|gr-frame|<tuple|scale|1cm|<tuple|0.5gw|0.510001gh>>|gr-geometry|<tuple|geometry|1par|0.6par>|gr-grid|<tuple|empty>|gr-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-edit-grid-aspect|<tuple|<tuple|axes|none>|<tuple|1|none>|<tuple|10|none>>|gr-edit-grid|<tuple|empty>|gr-edit-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-line-width|2ln|gr-arrow-end|\<gtr\>|gr-auto-crop|true|<graphics||<line|<point|0.0|2.0>|<point|0.0|-2.0>>|<cspline|<point|-3|0>|<point|0.0|-1.0>|<point|3.0|0.0>|<point|0.0|1.0>>|<line|<point|-5.0|0.0>|<point|4.0|0.0>>|<with|dash-style|10|line-width|0.5ln|<line|<point|1|1.6>|<point|-4.2|0.0>>>|<point|-1.79823|0.730531>|<with|arrow-end|\<gtr\>|line-width|2ln|<line|<point|-1.79823|0.730531>|<point|-1.43833509723508|-0.536000132292631>>>|<point|-1.69155|0.355091>|<point|-0.581097|1.11351>|<math-at|x<rsub|0>|<point|-0.655163|1.41135>>|<math-at|<wide|x|~>|<point|-2.03100608546104|0.226005423997883>>|<with|arrow-end|\<gtr\>|dash-style|11100|line-width|2ln|<line|<point|-1.79823|0.730531>|<point|0.0|0.0>>>|<math-at|<with|font-series|bold|d><rsub|1>|<point|-0.747674038792533|0.303742604245924>>|<with|arrow-end|\<gtr\>|line-width|2ln|<line|<point|-0.581097|1.11351>|<point|-1.79823|0.730531>>>|<math-at|<with|font-series|bold|d><rsub|0>|<point|-1.34566|1.13319>>|<math-at|-<with|font-series|bold|g><rsub|1>|<point|-2.23327540894583|-0.583752074868228>>|<math-at|x<rsub|1>|<point|-2.23467059134806|0.859091811086122>>>>>
      Subspace method in line-search algorithm [note that
      <math|<frac|\<partial\>f|\<partial\>d<rsub|0>><around*|\||<rsub|x<rsub|1>>|\<nobracket\>>=<with|font-series|bold|d><rsub|0>\<cdot\><with|font-series|bold|g><rsub|1>\<sim\>0>]
    </big-figure>

    <\proof>
      Since <math|f<around*|(|x|)>=f<around*|(|x<rsub|1>-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)>=f<rsub|1>+<around*|(|-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)><rsup|T>g<rsub|1>+<frac|1|2><around*|(|-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)><rsup|T>G<around*|(|-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)>+O<around*|(|\<mu\><rsup|3>,\<nu\><rsup|3>|)>>,
      we get an approximate-function <math|m<around*|(|\<mu\>,\<nu\>|)>> to
      2nd-order:

      <\eqnarray>
        <tformat|<table|<row|<cell|m<around*|(|\<mu\>,\<nu\>|)>>|<cell|\<assign\>>|<cell|f<rsub|1>+<around*|(|-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)><rsup|T>g<rsub|1>+<frac|1|2><around*|(|-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)><rsup|T>G<around*|(|-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)>>>|<row|<cell|>|<cell|=>|<cell|f<rsub|1>-\<mu\><around*|\||g<rsub|1>|\|>+<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1>\<nu\><around*|\||g<rsub|1>|\|>+<frac|1|2><around*|[|\<mu\>,\<nu\>|]><matrix|<tformat|<table|<row|<cell|<around*|\<langle\>|<wide|g|^><rsub|1>|\<rangle\>><rsup|2>>|<cell|-<around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>>>>|<row|<cell|-<around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>>>|<cell|<around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>>>>>><around*|[|\<mu\>,\<nu\>|]><rsup|T>>>|<row|<cell|>|<cell|\<equiv\>>|<cell|f<rsub|1>-\<mu\><around*|\||g<rsub|1>|\|>+<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1>\<nu\><around*|\||g<rsub|1>|\|>+<frac|1|2><around*|[|\<mu\>,\<nu\>|]><wide|G|^><around*|[|\<mu\>,\<nu\>|]><rsup|T>,>>>>
      </eqnarray>

      where

      <\equation*>
        <wide|G|^>\<assign\><matrix|<tformat|<table|<row|<cell|<around*|\<langle\>|-<wide|g|^><rsub|1>|\<rangle\>><rsup|2>>|<cell|<around*|\<langle\>|<wide|d|^><rsub|0>,-<wide|g|^><rsub|1>|\<rangle\>>>>|<row|<cell|<around*|\<langle\>|<wide|d|^><rsub|0>,-<wide|g|^><rsub|1>|\<rangle\>>>|<cell|<around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>>>>>>.
      </equation*>

      \;

      Since <math|G> is assumed <em|positive-definite>, so is
      <math|<wide|G|^>>, (i.e., <math|<around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>\<gtr\>0>
      and <math|det <wide|G|^>=<around*|\<langle\>|<wide|g|^><rsub|1>|\<rangle\>><rsup|2><around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>-<around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>><rsup|2>\<gtr\>0>,)
      one can calculate the minimum of <math|m<around*|(|\<mu\>,\<nu\>|)>>,
      which takes at

      <\equation*>
        <around*|[|\<mu\>,\<nu\>|]><rsup|T>=<around*|\||g<rsub|1>|\|><wide|G|^><rsup|-1><around*|[|1,-<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1>|]><rsup|T>.
      </equation*>

      Then by <math|d<rsub|1>=-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>=<wide|s|^><around*|(|-<wide|g|^><rsub|1>+<wide|\<beta\>|^><wide|d|^><rsub|0>|)>>,
      which gives

      <\equation*>
        <wide|s|^>\<equiv\>\<mu\>,<wide|\<beta\>|^>\<equiv\><frac|\<nu\>|\<mu\>>.
      </equation*>

      Note that <math|<wide|\<beta\>|^>> is the wanted result, but
      <math|<wide|s|^>> is a little different from the lemma, which is got
      from the Taylor expansion of <math|f> in direction
      <math|-<wide|g|^><rsub|1>+<wide|\<beta\>|^><wide|d|^><rsub|0>> with
      given <math|<wide|\<beta\>|^>>,

      <\eqnarray>
        <tformat|<table|<row|<cell|>|<cell|>|<cell|f<around*|(|x<rsub|1>+<wide|s|^><around*|(|-<wide|g|^><rsub|1>+<wide|\<beta\>|^><wide|d|^><rsub|0>|)>|)>>>|<row|<cell|>|<cell|=>|<cell|f<rsub|1>+<wide|s|^><around*|(|-<wide|g|^><rsub|1>+\<beta\><wide|d|^><rsub|0>|)>\<cdot\><wide|g|^><rsub|1><around*|\||g<rsub|1>|\|>+<frac|1|2><wide|s|^><rsup|2><around*|\<langle\>|-<wide|g|^><rsub|1>+<wide|\<beta\>|^><wide|d|^><rsub|0>|\<rangle\>><rsup|2>+o<around*|(|<wide|s|^><rsup|2><around*|\<langle\>|<around*|(|-<wide|g|^><rsub|1>+<wide|\<beta\>|^><wide|d|^><rsub|0>|)>|\<rangle\>><rsup|2>|)>.>>>>
      </eqnarray>

      Since <math|<around*|\<langle\>|-<wide|g|^><rsub|1>+<wide|\<beta\>|^><wide|d|^><rsub|0>|\<rangle\>><rsup|2>\<gtr\>0>,
      one can safely get the minimun point at the lemma's <math|<wide|s|^>>.

      (2) To get <math|<around*|\<langle\>|d<rsub|0>|\<rangle\>><rsup|2>,<around*|\<langle\>|d<rsub|0>,g<rsub|1>|\<rangle\>>,<around*|\<langle\>|g<rsub|1>|\<rangle\>><rsup|2>>
      without Hessian matrix, one can use the Taylor expansion of
      <math|g<around*|(|x|)>=\<nabla\>f<around*|(|x|)>> at <math|x<rsub|1>>,\ 

      <\equation*>
        g<around*|(|x|)>=g<rsub|1>+G<around*|(|x-x<rsub|1>|)>+o<around*|(|<around*|\<\|\|\>|x-x<rsub|1>|\<\|\|\>>|)>.
      </equation*>

      By taking <math|x=x<rsub|0>=x<rsub|1>-s<rsub|0>d<rsub|0>>, one get

      <\equation*>
        s<rsub|0>G d<rsub|0>=g<rsub|1>-g<rsub|0>+o<around*|(|s<rsub|0><around*|\<\|\|\>|d<rsub|0>|\<\|\|\>>|)>,
      </equation*>

      which gives the expression of <math|<around*|\<langle\>|d<rsub|0>|\<rangle\>><rsup|2>,<around*|\<langle\>|d<rsub|0>,g<rsub|1>|\<rangle\>>>,
      thus <math|<around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>,<around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>>>.

      To get <math|<around*|\<langle\>|g<rsub|1>,g<rsub|1>|\<rangle\>>>, one
      need another point near <math|x<rsub|1>>, such as
      <math|<wide|x|~>=x<rsub|1>-<wide|s|^>g<rsub|1>>, then

      <\equation*>
        <wide|s|^>G g<rsub|1>=g<rsub|1>-g<rsub|<wide|s|^>>+o<around*|(|<wide|s|^><around*|\<\|\|\>|g<rsub|1>|\<\|\|\>>|)>,
      </equation*>

      where <math|g<rsub|<wide|s|^>>=g<around*|(|<wide|x|^>|)>>, which gives
      the expression <math|<around*|\<langle\>|g<rsub|1>,g<rsub|1>|\<rangle\>>>
      and <math|<around*|\<langle\>|<wide|g|^><rsub|1>|\<rangle\>><rsup|2>>.
    </proof>
  </example>

  <\example>
    [This method does not need a new point in Ex.6]

    Also given info of <math|x<rsub|k-1>,g<rsub|k-1>> and
    <math|x<rsub|k>=x<rsub|k-1>+d<rsub|k-1>,g<rsub|k>>, we want to find next
    point <math|x<rsub|k+1>=x<rsub|k>+d<rsub|k>>, where
    <math|d<rsub|k>=\<mu\><around*|(|-g<rsub|k>|)>+\<nu\>d<rsub|k-1>>.

    By Taylor expansion of <math|g<around*|(|x|)>> at <math|x<rsub|k>>, we
    get

    <\equation*>
      g<around*|(|x<rsub|k>+\<mu\><around*|(|-g<rsub|k>|)>+\<nu\>d<rsub|k-1>|)>\<simeq\>g<rsub|k>+G\<cdot\><around*|(|-\<mu\>g<rsub|k>+\<nu\>d<rsub|k-1>|)>=g<rsub|k>-\<mu\>G\<cdot\>g<rsub|k>+\<nu\><around*|(|-g<rsub|k-1>+g<rsub|k>|)>,
    </equation*>

    where we have used <math|g<rsub|k-1>=g<around*|(|x<rsub|k>+0g<rsub|k>-d<rsub|k-1>|)>\<simeq\>g<rsub|k>-G\<cdot\>d<rsub|k-1>\<Rightarrow\>G\<cdot\>d<rsub|k-1>=-g<rsub|k-1>+g<rsub|k>>.

    We hope <math|g<around*|(|x<rsub|k+1>|)>\<simeq\>0>, which means
    <math|g<rsub|k>-\<mu\>G\<cdot\>g<rsub|k>+\<nu\><around*|(|-g<rsub|k-1>+g<rsub|k>|)>=0>.
    Note that we can not solve <math|\<mu\>,\<nu\>> from this eqn, since we
    do not know <math|G\<cdot\>g<rsub|k>>, but we indeed have some info: by
    <math|<around*|\<langle\>|d<rsub|k-1>,\<cdot\>|\<rangle\>>> on this eqn,
    we can get

    <\equation*>
      <around*|\<langle\>|-g<rsub|k-1>+g<rsub|k>,g<rsub|k>|\<rangle\>>\<mu\>-<around*|\<langle\>|-g<rsub|k-1>+g<rsub|k>,d<rsub|k-1>|\<rangle\>>\<nu\>=<around*|\<langle\>|g<rsub|k>,d<rsub|k-1>|\<rangle\>>.
    </equation*>

    This gives a requirement of <math|<around*|(|\<mu\>,\<nu\>|)>>, and we
    need one more requirement.

    A more requirement may be as follows: we hope the direction
    <math|d<rsub|k>> should be closer to <math|-g<rsub|k>> than to
    <math|d<rsub|k-1>>, more precisely, we require

    <\equation*>
      d<rsub|k>=\<mu\><around*|\<\|\|\>|g<rsub|k>|\<\|\|\>><frac|-g<rsub|k>|<around*|\<\|\|\>|g<rsub|k>|\<\|\|\>>>+\<nu\><around*|\<\|\|\>|d<rsub|k-1>|\<\|\|\>><frac|d<rsub|k-1>|<around*|\<\|\|\>|d<rsub|k-1>|\<\|\|\>>>//K<frac|-g<rsub|k>|<around*|\<\|\|\>|g<rsub|k>|\<\|\|\>>>+<around*|(|\<pm\>1|)><frac|d<rsub|k-1>|<around*|\<\|\|\>|d<rsub|k-1>|\<\|\|\>>>,
    </equation*>

    i.e.,

    <\equation*>
      <frac|\<mu\><around*|\<\|\|\>|g<rsub|k>|\<\|\|\>>|K>=<frac|\<nu\><around*|\<\|\|\>|d<rsub|k-1>|\<\|\|\>>|\<pm\>1>,K=const\<gtr\>1
    </equation*>

    <small|[note: this requirement is relative-requirement, which means if
    <math|-g<rsub|k>//d<rsub|k-1>>, it also can be fulfilled.]>

    Now we can solve <math|<around*|(|\<mu\>,\<nu\>|)>> as follows,

    <\align>
      <tformat|<table|<row|<cell|\<mu\>=>|<cell|<frac|<around*|\<langle\>|g<rsub|k>,d<rsub|k-1>|\<rangle\>>|<around*|\<langle\>|-g<rsub|k-1>+g<rsub|k>,g<rsub|k>\<mp\><frac|<around*|\<\|\|\>|g<rsub|k>|\<\|\|\>>|K<around*|\<\|\|\>|d<rsub|k-1>|\<\|\|\>>>d<rsub|k-1>|\<rangle\>>>,>>|<row|<cell|\<nu\>=>|<cell|\<pm\><frac|<around*|\<\|\|\>|g<rsub|k>|\<\|\|\>>|K<around*|\<\|\|\>|d<rsub|k-1>|\<\|\|\>>>\<mu\>.>>>>
    </align>

    Note that <math|K\<gtr\>1> and may be choosed as 2,3,<text-dots>, and
    thus <math|\<mu\>\<gtr\>0> is always true, we choose the signature such
    that <math|\<mu\>> is the smaller one.
  </example>

  <\example>
    [newton-like method, this method also does not need a new point]

    This method only uses <math|<around*|(|x<rsub|0>,g<rsub|0>|)>> and
    <math|<around*|(|x<rsub|1>,g<rsub|1>|)>>. To get next point,
    <math|x<rsub|2>=x<rsub|1>+d<rsub|1>>, we need the distance
    <math|d<rsub|1>>.

    At <math|x<rsub|1>>, we use Taylor expansion of gradient of <math|f>:

    <\equation*>
      g<around*|(|x<rsub|1>+d|)>\<simeq\>g<rsub|1>+H\<cdot\>d.
    </equation*>

    Let <math|\<Delta\>x\<assign\>x<rsub|1>-x<rsub|0>,\<Delta\>g\<assign\>g<rsub|1>-g<rsub|0>>,
    then by <math|g<around*|(|x<rsub|0>|)>=g<around*|(|x<rsub|1>-\<Delta\>x|)>\<simeq\>g<rsub|1>-H\<cdot\>\<Delta\>x>,
    we get

    <\equation*>
      H\<cdot\>\<Delta\>x=\<Delta\>g.
    </equation*>

    This is a constraint on Hessian matrix.

    On the <math|x<rsub|0>x<rsub|1>>-line, there exists a point,
    <math|x<rsub|t>=x<rsub|1>-t\<Delta\>x>, whose function's value is
    minimum, i.e., the gradient <math|g<rsub|t>=g<rsub|1>-t\<Delta\>g> is
    orthognoal to <math|\<Delta\>x>, this makes

    <\equation*>
      t=<frac|<around*|\<langle\>|g<rsub|1>,\<Delta\>x|\<rangle\>>|<around*|\<langle\>|\<Delta\>g,\<Delta\>x|\<rangle\>>>.
    </equation*>

    <small|[note: <math|<around*|\<langle\>|\<Delta\>g,\<Delta\>x|\<rangle\>>=0>
    means <math|<around*|\<langle\>|\<Delta\>x<around*|\||H|\|>\<Delta\>x|\<rangle\>>=0>,
    which means <math|H> is degenerate, which means we should not use Newton
    method, so we here assume <math|<around*|\<langle\>|\<Delta\>g,\<Delta\>x|\<rangle\>>\<neq\>0>.]>

    Newton method gives the next point <math|x<rsub|\<ast\>>\<equiv\>x<rsub|t>+d<rsub|\<ast\>>>(the
    starting point is above point <math|x<rsub|t>>), with

    <\equation*>
      H\<cdot\>d<rsub|\<ast\>>=-g<rsub|t>.
    </equation*>

    By <math|<around*|\<langle\>|\<Delta\>x,\<cdot\>|\<rangle\>>> on above
    eqn, we find

    <\equation*>
      <around*|\<langle\>|d<rsub|\<ast\>>,\<Delta\>g|\<rangle\>>=0.
    </equation*>

    So the best next point from Newton method sits on the set:

    <\equation*>
      \<Sigma\>\<assign\><around*|{|x<rsub|t>+\<sigma\>:\<sigma\>\<perp\>\<Delta\>g|}>.
    </equation*>

    Our work is try to find a point <math|p> whose normal-line(through the
    point <math|p> and the tagent-vector is <math|p>'s gradient) intersects
    <math|\<Sigma\>>, and let their intersection-point to be the \Pbest
    point\Q. So which point <math|p> is the best?

    Obviously(see fig below), the best point of <math|p> is the point that
    sits on the axes of the ellipses, since its normal-line contain the
    center of the ellipses! So how to find such a point?

    Obviously, the two points that sit on the axes is splitted by
    <math|x<rsub|t>>.\ 

    <\big-figure|<with|gr-mode|<tuple|edit|math-at>|gr-frame|<tuple|scale|1cm|<tuple|0.5gw|0.5gh>>|gr-geometry|<tuple|geometry|1par|0.6par>|gr-grid|<tuple|empty>|gr-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-edit-grid-aspect|<tuple|<tuple|axes|none>|<tuple|1|none>|<tuple|10|none>>|gr-edit-grid|<tuple|empty>|gr-edit-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-auto-crop|true|gr-arrow-end|\|\<gtr\>|gr-line-width|2ln|<graphics||<cspline|<point|-3|0>|<point|0.0|1.4>|<point|3.0|0.0>|<point|0.0|-1.5>>|<with|arrow-end|\<gtr\>|<line|<point|-4|0>|<point|4.0|0.0>>>|<with|arrow-end|\<gtr\>|<line|<point|0|-2>|<point|0.0|3.0>>>|<line|<point|2|3>|<point|-4.0|-1.0>>|<point|-2.5|0>|<point|0|1.66667>|<with|arrow-end|\|\<gtr\>|line-width|2ln|<line|<point|0|1.66667>|<point|0.0|0.914567065919462>>>|<with|arrow-end|\|\<gtr\>|line-width|2ln|<line|<point|-2.5|0>|<point|-1.46386124180286|0.0>>>|<with|point-style|star|<point|0.0154631|0>>|<with|arrow-end|\|\<gtr\>|line-width|2ln|<line|<point|-1.0656|0.956267>|<point|-0.697496361952639|0.450224897473211>>>|<point|-1.0656|0.956267>|<with|arrow-end|\<gtr\>|dash-style|10|<spline|<point|-1.70251|2.32377>|<point|-0.580665431935441|2.36610332054505>|<point|0.233502532121533|1.82233502141436>>>|<math-at|t\<pm\>\<delta\>|<point|-2.46451250165366|2.32376967852891>>|<with|arrow-end|\<gtr\>|dash-style|10|<spline|<point|-2.58208757772192|2.17682563831195>|<point|-2.63384706971822|1.32892909114962>|<point|-2.32315968118072|0.117893545879523>>>|<math-at|x<rsub|0>x<rsub|1><text|-line>|<point|-4.6556918904617|-1.42275763989946>>|<line|<point|-0.882363160183357|1.07842455987776>|<point|-0.760996824976849|0.894728138642678>|<point|-0.932915651267251|0.773862209975949>>|<math-at|-g<rsub|<wide|t|^>>|<point|-0.79233364201614|0.249421219738061>>|<math-at|x<rsub|\<ast\>>|<point|0.0|-0.268826993993889>>|<with|dash-style|wave|<line|<point|-2.2|0.2>|<point|-2.65|-0.1>>>|<with|dash-style|wave|<line|<point|0.4|1.93333>|<point|-0.2|1.53333333333333>>>|<line|<point|-1.75583741235613|1.62220531816378>|<point|2.13885765312872|-1.91265379018389>>|<math-at|\<Sigma\>|<point|2.26585857917714|-1.72215240111126>>|<math-at|x<rsub|t>|<point|-1.13916419243302|1.25130577468191>>|<point|-1.55796|0.628024>|<with|dash-style|10|<line|<point|-2.2003406535256|1.18441592803281>|<point|1.69435441195925|-2.22344225426644>>>|<with|arrow-end|\|\<gtr\>|line-width|2ln|<line|<point|-1.55796|0.628024>|<point|-1.1512016811253|0.266419327182547>>>|<math-at|-g<rsub|\<tau\>>|<point|-1.10263899912756|0.0>>|<math-at|x<rsub|\<tau\>>|<point|-2.09450654848525|0.591744939806853>>>>>
      note: \<#77ED\>\<#8F74\>\<#4E0A\>\<#7684\>\<#70B9\>\<#66F4\>\<#597D\>\<#FF01\>
    </big-figure>

    We assume that <math|<wide|t|^>\<pm\>\<delta\>> are just the two points,
    which means we require:

    <\equation*>
      <around*|\<langle\>|g<rsub|t+\<delta\>>,g<rsub|t-\<delta\>>|\<rangle\>>=0\<Rightarrow\>\<delta\>=<frac|<around*|\<\|\|\>|g<rsub|t>|\<\|\|\>>|<around*|\<\|\|\>|\<Delta\>g|\<\|\|\>>>.
    </equation*>

    Since we did not get the axes points exactly, above two points'(not sit
    on axes exactly) normal-lines intersects to <math|\<Sigma\>> with two
    different points, which one is better?

    To answer this question, one should know that there exists one point,
    <math|x<rsub|\<tau\>>=x<rsub|1>-\<tau\>\<Delta\>x>, on
    <math|x<rsub|0>x<rsub|1>>-line, whose normal-line is parallel to
    <math|\<Sigma\>>, i.e., <math|g<rsub|\<tau\>>\<perp\>\<Delta\>g>, this
    gives

    <\equation*>
      \<tau\>=<frac|<around*|\<langle\>|g<rsub|1>,\<Delta\>g|\<rangle\>>|<around*|\<\|\|\>|\<Delta\>g|\<\|\|\>><rsup|2>>.
    </equation*>

    For a point on <math|x<rsub|0>x<rsub|1>>-line, its normal line would
    intersect to <math|\<Sigma\>> at far away <math|x<rsub|\<ast\>>>, if it
    is near <math|x<rsub|\<tau\>>>, so we should choose the one in
    <math|<around*|{|x<rsub|t+\<delta\>>,x<rsub|t-\<delta\>>|}>>, s.t., it is
    leave <math|x<rsub|\<tau\>>> away, this is to say: pick
    <math|x<rsub|\<xi\>>>, where

    <\equation*>
      \<xi\>=<choice|<tformat|<table|<row|<cell|t+\<delta\>,>|<cell|t\<gtr\>\<tau\>,>>|<row|<cell|t-\<delta\>,>|<cell|t\<less\>\<tau\>.>>>>>
    </equation*>

    Now we have a point <math|x<rsub|\<xi\>>>, where <math|\<xi\>> is choosen
    as abve, and consider the intersection point of the normal-line and
    <math|\<Sigma\>>:

    <\equation*>
      <around*|{|x<rsub|\<xi\>>+\<lambda\>g<rsub|\<xi\>>:\<lambda\>\<in\>\<bbb-R\>|}>\<cap\><around*|{|x<rsub|t>+\<sigma\>:\<sigma\>\<perp\>\<Delta\>g|}>.
    </equation*>

    They intersect with <math|\<Sigma\>> at points
    <math|x<rsub|\<xi\>>+\<lambda\>g<rsub|\<xi\>>>, with [by
    <math|<around*|\<langle\>|\<Delta\>g,\<cdot\>|\<rangle\>>>]

    <\equation*>
      \<lambda\>=<frac|<around*|\<langle\>|x<rsub|t>-x<rsub|\<xi\>>,\<Delta\>g|\<rangle\>>|<around*|\<langle\>|g<rsub|\<xi\>>,\<Delta\>g|\<rangle\>>>=<frac|<around*|(|\<xi\>-t|)><around*|\<langle\>|\<Delta\>x,\<Delta\>g|\<rangle\>>|<around*|\<langle\>|g<rsub|1>-\<xi\>\<Delta\>g,\<Delta\>g|\<rangle\>>>.
    </equation*>

    Now we get the \Pbest point\Q:

    <\align>
      <tformat|<table|<row|<cell|x<rsub|2>\<assign\>>|<cell|x<rsub|\<xi\>>+\<lambda\>g<rsub|\<xi\>>>>|<row|<cell|=>|<cell|x<rsub|1>-\<xi\>\<Delta\>x+\<lambda\><around*|(|g<rsub|1>-\<xi\>\<Delta\>g|)>,>>>>
    </align>

    or the distance

    <\equation*>
      d<rsub|1>\<assign\>-\<xi\>\<Delta\>x+\<lambda\><around*|(|g<rsub|1>-\<xi\>\<Delta\>g|)>.
    </equation*>
  </example>

  <\example>
    [newton-like method: version-2]

    [note: This method is an alternative method similar to above method, but
    is shown that not good enough. The reason may be from that one of
    <math|t\<pm\>\<delta\>> is not a good choice.]

    In above method, we have got <math|<wide|t|^>> and <math|\<delta\>>,
    since we assume <math|<wide|t|^>\<pm\>\<delta\>> approximate the points
    on axes, which means we assume their intersections to <math|\<Sigma\>>
    are the same point and just the <math|x<rsub|\<ast\>>>, why not just find
    the intersection point of their normal-lines. From this consideration, we
    have the following graph.

    <\big-figure|<with|gr-mode|<tuple|edit|math-at>|gr-frame|<tuple|scale|1cm|<tuple|0.5gw|0.799999gh>>|gr-geometry|<tuple|geometry|1par|0.6par>|gr-grid|<tuple|empty>|gr-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-edit-grid-aspect|<tuple|<tuple|axes|none>|<tuple|1|none>|<tuple|10|none>>|gr-edit-grid|<tuple|empty>|gr-edit-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-dash-style|10|gr-auto-crop|true|<graphics||<line|<point|-4.0|-4.0>|<point|1.0|1.0>>|<point|-3|-3>|<point|-1.5|-1.5>|<point|0|0>|<with|arrow-end|\|\<gtr\>|line-width|2ln|<line|<point|-1.5|-1.5>|<point|-1.0|-2.0>>>|<with|arrow-end|\|\<gtr\>|line-width|2ln|<line|<point|0|0>|<point|0.0|-1.0>>>|<with|arrow-end|\|\<gtr\>|line-width|2ln|<line|<point|-3|-3>|<point|-2.0|-3.0>>>|<point|0|-3>|<with|dash-style|10|<line|<point|0|0>|<point|0.0|-3.0>|<point|-3.0|-3.0>>>|<with|dash-style|10|<line|<point|-1.5|-1.5>|<point|0.0|-3.0>>>|<math-at|x<rsub|0>x<rsub|1><text|-line>|<point|-3.7|-4>>|<math-at|<wide|t|^>|<point|-2|-1.3>>|<math-at|<wide|t|^>-\<delta\>|<point|-4.0|-3.0>>|<math-at|<wide|t|^>+\<delta\>|<point|-1.0|0.4>>|<math-at|\<delta\><around*|\<\|\|\>|\<Delta\>x|\<\|\|\>>|<point|-1.7|-0.5>>|<math-at|x<rsub|2>|<point|0.0|-3.5>>>>>
      <math|<around*|\<\|\|\>|x<rsub|<wide|t|^>>-x<rsub|2>|\<\|\|\>>=\<delta\><around*|\<\|\|\>|\<Delta\>x|\<\|\|\>>>.
    </big-figure>

    By the graph, the next point is choosed as

    <\equation*>
      x<rsub|2>=x<rsub|<wide|t|^>>-\<delta\><around*|\<\|\|\>|\<Delta\>x|\<\|\|\>>g<rsub|<wide|t|^>>/<around*|\<\|\|\>|g<rsub|<wide|t|^>>|\<\|\|\>>=x<rsub|<wide|t|^>>-<frac|<around*|\<\|\|\>|\<Delta\>x|\<\|\|\>>|<around*|\<\|\|\>|\<Delta\>g|\<\|\|\>>>g<rsub|<wide|t|^>>,
    </equation*>

    i.e. the distance from <math|x<rsub|1>> to <math|x<rsub|2>> is

    <\equation*>
      d<rsub|1>=-<wide|t|^>\<Delta\>x-<frac|<around*|\<\|\|\>|\<Delta\>x|\<\|\|\>>|<around*|\<\|\|\>|\<Delta\>g|\<\|\|\>>><around*|(|g<rsub|1>-<wide|t|^>\<Delta\>g|)>,with
      <wide|t|^>=<frac|<around*|\<langle\>|g<rsub|1>,\<Delta\>x|\<rangle\>>|<around*|\<langle\>|\<Delta\>g,\<Delta\>x|\<rangle\>>>.
    </equation*>

    By some numerical-experiments, it is found that this method is quite
    smaller than previous method.
  </example>

  <\example>
    BB-2 method, and its improvement

    (1) BB-2 method.

    In example-8, we have got

    <\equation*>
      H\<cdot\>\<Delta\>x=\<Delta\>g,or,\<Delta\>x=H<rsup|-1>\<cdot\>\<Delta\>g
    </equation*>

    BB-2 method approximates inverse-Hessien matrix as
    <math|H<rsup|-1>\<sim\>\<alpha\>I>, in the meaning that
    <math|<around*|\<\|\|\>|\<Delta\>x-\<alpha\>\<Delta\>g|\<\|\|\>><rsup|2>>
    is minimal. This gives

    <\equation*>
      \<alpha\>=<frac|<around*|\<langle\>|\<Delta\>x,\<Delta\>g|\<rangle\>>|<around*|\<langle\>|\<Delta\>g,\<Delta\>g|\<rangle\>>>.
    </equation*>

    Then we can get by Newton method:

    <\equation*>
      x<rsub|2>\<assign\>x<rsub|1>-H<rsup|-1>g<rsub|1>=x<rsub|1>-\<alpha\>g<rsub|1>=x<rsub|1>-<frac|<around*|\<langle\>|\<Delta\>x,\<Delta\>g|\<rangle\>>|<around*|\<langle\>|\<Delta\>g,\<Delta\>g|\<rangle\>>>g<rsub|1>,
    </equation*>

    or the distance from <math|x<rsub|1>> to <math|x<rsub|2>>:

    <\equation*>
      d<rsub|1>\<assign\>-<frac|<around*|\<langle\>|\<Delta\>x,\<Delta\>g|\<rangle\>>|<around*|\<langle\>|\<Delta\>g,\<Delta\>g|\<rangle\>>>g<rsub|1>.
    </equation*>

    (2) An improvement to BB-2 method.

    We have <math|H<rsup|-1>\<sim\>\<alpha\>I> in BB-2 method, but we start
    from an other point on the line <math|x<rsub|0>x<rsub|1>>,
    <math|x<rsub|t>=x<rsub|1>-t\<Delta\>x>. This point is choosen s.t.,
    <math|g<rsub|t>=g<rsub|1>-t\<Delta\>g> is orthogonal to line
    <math|x<rsub|0>x<rsub|1>>, which means

    <\equation*>
      t=<frac|<around*|\<langle\>|g<rsub|1>,\<Delta\>x|\<rangle\>>|<around*|\<langle\>|\<Delta\>g,\<Delta\>x|\<rangle\>>>.
    </equation*>

    Thus we have

    <\equation*>
      x<rsub|2>\<assign\>x<rsub|t>-\<alpha\>g<rsub|t>=x<rsub|1>-t\<Delta\>x-\<alpha\><around*|(|g<rsub|1>-t\<Delta\>g|)>,
    </equation*>

    or

    <\equation*>
      d<rsub|1>\<assign\>-t\<Delta\>x-\<alpha\><around*|(|g<rsub|1>-t\<Delta\>g|)>.
    </equation*>

    Note that it is quite amazing that this improvement is quite good for
    high-dim problems.
  </example>

  <subsection|Trust region approach>

  In this subsection, we consider another approach to address optimization
  problems, trust region approach, which may save some computations compared
  to line search approach.

  Assume we have got <math|x<rsub|k>>, to get next approximation point
  <math|x<rsub|k+1>>, we first restrict a compact region whose center is
  <math|x<rsub|k>> and radius is <math|R<rsub|k>>(depend <math|x<rsub|k>>),
  and find <math|x<rsub|k+1>> in this region

  <\equation*>
    <around*|\<\|\|\>|x<rsub|k+1>-x<rsub|k>|\<\|\|\>>\<leqslant\>R<rsub|k>,
  </equation*>

  by some methods, like Newton, CG, Subspace, etc.

  This approach is reasonable, because in those methods, we get
  <math|x<rsub|k+1>> approximated by some approximate-function which is from
  Taylor expansion at <math|x<rsub|k>>. This means when the distance
  <math|<around*|\<\|\|\>|x<rsub|k+1>-x<rsub|k>|\<\|\|\>>> is too large, the
  approximate-function is not a good approximation. So we should resrict in a
  small region, defined by <math|R<rsub|k>>, where the approximate-function
  is good.

  Let's denote the object function <math|f<around*|(|x|)>,x\<in\>\<bbb-R\><rsup|n>>,
  and near <math|x<rsub|k>>, we have an approximate-function(model function),
  <math|m<rsub|k><around*|(|d|)>> with

  <\equation*>
    m<rsub|k><around*|(|0|)>=f<around*|(|x<rsub|k>|)>,m<rsub|k><around*|(|d<rsub|k>|)>\<approx\>f<around*|(|x<rsub|k>+d<rsub|k>|)>.
  </equation*>

  Assume we can solve the approximate problem,

  <\equation>
    d<rsub|k>\<assign\>min<rsub|<around*|\<\|\|\>|d|\<\|\|\>>\<leqslant\>R<rsub|k>>m<rsub|k><around*|(|d|)><label|solve-distance>,
  </equation>

  and get next point <math|x<rsub|k+1>\<assign\>x<rsub|k>+d<rsub|k>>.

  To know how well approximation of the app-function <math|m<rsub|k>> on
  <math|f> is, consider the ration:

  <\equation*>
    \<rho\><rsub|k>\<assign\><frac|f<around*|(|x<rsub|k>|)>-f<around*|(|x<rsub|k+1>|)>|m<rsub|k><around*|(|0|)>-m<rsub|k><around*|(|d<rsub|k>|)>>.
  </equation*>

  If <math|\<rho\><rsub|k>\<sim\>1>, then we know the model function is a
  good approximation; if <math|\<rho\><rsub|k>\<less\>1>, then the object
  funtion get better result; in other cases, model function is not a good
  approximation which means one shoud smaller the region in the
  next-iteration.

  <\scm-code>
    ;;;;Trust region approach

    ;; *** assume we have (a), (b), (c):

    ;;(a) object function

    (define (f x) ...)

    ;;(b) model-function

    (define (model-func x) (lambda (d) ...))

    ;;(c) solve the distance, d, from model-function in region R

    (define (solve-func x m R) ...) ;=\<gtr\> d, so x_new = x + d

    ;; *** trust-region ***

    (define (trust-region-approach f model-func x_init R_init Rmin)

    \ \ ;;x_init: start point, R_init:start radius, Rmin: R\<gtr\>Rmin

    \ \ (let loop ([x0 x_init] [R0 R_init])

    \ \ \ \ \ (define m (model-func x0))

    \ \ \ \ \ (let* ([d0 (solve-func x0 m R0)] ;;solve-func

    \ \ \ \ \ \ \ \ \ \ \ \ [x1 (vector-add x0 d0)])

    \ \ \ \ \ \ \ (let ([rho0 (/ (- (f x0) (f x1)) (- (f x0) (m d0)))])

    \ \ \ \ \ \ \ \ \ (if (\<less\>= rho0 Rmin) x0 ;;ok

    \ \ \ \ \ \ \ \ \ \ \ \ \ (let ([R1 (cond [(\<less\> rho0 0.25) (/ R0 4)]

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ [(and (\<gtr\>
    rho0 0.75) (= (vector-norm d0) R0))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (* 2 R0)]

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (else R0))])

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (loop x1 R1)))))))

    ;;note: case r \<less\>= Rmin is puzzling: is this the error case?
  </scm-code>

  Now the key problem is to solve <math|d<rsub|k>> in problem
  (<reference|solve-distance>) (i.e., <verbatim|solve-func> in above-code),
  in the following subsubsections, we will address this problem.

  <subsubsection|Newton, Quasi-Newton, LMF and CG methods>

  We can use Newton method to get an model function.

  Consider the Taylor expansion,

  <\equation*>
    f<around*|(|x|)>=f<around*|(|x<rsub|0>|)>+<around*|(|x-x<rsub|0>|)><rsup|T>g+<frac|1|2><around*|(|x-x<rsub|0>|)><rsup|T>G<around*|(|x-x<rsub|0>|)>+o<around*|(|<around*|\<\|\|\>|x-x<rsub|0>|\<\|\|\>><rsup|2>|)>,
  </equation*>

  where <math|g=grad<around*|(|x<rsub|0>|)>>, <math|G> is the Hessian matrix
  of <math|f> at <math|x<rsub|0>>.

  This gives a model function,

  <\equation*>
    m<around*|(|d|)>=f<around*|(|x<rsub|0>|)>+d<rsup|T>g+<frac|1|2>d<rsup|T>G
    d.
  </equation*>

  Under assuming <math|G> is positive definite, <math|m<around*|(|d|)>> has a
  minimun value with

  <\equation*>
    d=-G<rsup|-1>g.
  </equation*>

  <\note>
    (i) The Hessian <math|G> and its inverse are hard to get, one can use
    <with|font-series|bold|Quasi-Newton> method instead;

    (ii) <math|G> may even not inversable, one can use <math|G+\<nu\>I> with
    some not quite large <math|\<nu\>\<gtr\>0> instead, this method is called
    <with|font-series|bold|LMF> method(Levenberg-Marquardt-Fletcher);

    (iii) One also can use <with|font-series|bold|Congugate-Gradient> method
    to get <math|d> from <math|m<around*|(|d|)>> instead of solving
    <math|G<rsup|-1>g>.

    (iv) Since <math|G<rsup|-1>> or <math|G> is hard to get, except
    Quasi-Newton method, subspace method shown below would be most useful.
  </note>

  <subsubsection|Subspace method>

  In the following, we consider the <strong|subspace method>.

  Suppose that we have got <math|x<rsub|0>> and
  <math|x<rsub|1>=x<rsub|0>+d<rsub|0>>, try to get
  <math|x<rsub|2>=x<rsub|1>+d<rsub|1>>, where
  <math|d<rsub|1>\<in\>span<around*|{|-g<rsub|1>,d<rsub|0>|}>>, i.e.,
  consider

  <\equation*>
    x=x<rsub|1>-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>,
  </equation*>

  and try to find best <math|\<mu\>,\<nu\>>, where
  <math|<wide|g|^><rsub|1>=g<rsub|1>/<around*|\||g<rsub|1>|\|>> and
  <math|<wide|d|^><rsub|0>=d<rsub|0>/<around*|\||d<rsub|0>|\|>>.

  Since <math|f<around*|(|x|)>=f<around*|(|x<rsub|1>-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)>=f<rsub|1>+<around*|(|-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)><rsup|T>g<rsub|1>+<frac|1|2><around*|(|-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)><rsup|T>G<around*|(|-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)>+O<around*|(|\<mu\><rsup|3>,\<nu\><rsup|3>|)>>,
  we get a model function <math|m<around*|(|\<mu\>,\<nu\>|)>> and its
  2nd-order part <math|m<rsup|<around*|(|2|)>><around*|(|\<mu\>,\<nu\>|)>>:

  <\eqnarray>
    <tformat|<table|<row|<cell|m<around*|(|\<mu\>,\<nu\>|)>>|<cell|\<assign\>>|<cell|f<rsub|1>+<around*|(|-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)><rsup|T>g<rsub|1>+<frac|1|2><around*|(|-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)><rsup|T>G<around*|(|-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>|)>>>|<row|<cell|>|<cell|=>|<cell|f<rsub|1>-\<mu\><around*|\||g<rsub|1>|\|>+<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1>\<nu\><around*|\||g<rsub|1>|\|>+<frac|1|2><around*|[|\<mu\>,\<nu\>|]><matrix|<tformat|<table|<row|<cell|<around*|\<langle\>|<wide|g|^><rsub|1>|\<rangle\>><rsup|2>>|<cell|-<around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>>>>|<row|<cell|-<around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>>>|<cell|<around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>>>>>><around*|[|\<mu\>,\<nu\>|]><rsup|T>>>|<row|<cell|>|<cell|\<equiv\>>|<cell|f<rsub|1>-\<mu\><around*|\||g<rsub|1>|\|>+<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1>\<nu\><around*|\||g<rsub|1>|\|>+<frac|1|2><around*|[|\<mu\>,\<nu\>|]><wide|G|^><around*|[|\<mu\>,\<nu\>|]><rsup|T>.>>>>
  </eqnarray>

  where

  <\equation*>
    <around*|\<langle\>|\<xi\>,\<eta\>|\<rangle\>>\<assign\>\<xi\><rsup|T>G\<eta\>=\<xi\><rsup|T><wide|G|^>\<eta\>,<wide|G|^>\<assign\><matrix|<tformat|<table|<row|<cell|<around*|\<langle\>|-<wide|g|^><rsub|1>|\<rangle\>><rsup|2>>|<cell|<around*|\<langle\>|<wide|d|^><rsub|0>,-<wide|g|^><rsub|1>|\<rangle\>>>>|<row|<cell|<around*|\<langle\>|<wide|d|^><rsub|0>,-<wide|g|^><rsub|1>|\<rangle\>>>|<cell|<around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>>>>>>,
  </equation*>

  and the elements of <math|<wide|G|^>> we have got in Lemma-5,

  <\eqnarray>
    <tformat|<table|<row|<cell|<around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>>|<cell|=>|<cell|<frac|<around*|\||g<rsub|1>|\|>|<around*|\||d<rsub|0>|\|>><around*|(|<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1>-<frac|<around*|\||g<rsub|0>|\|>|<around*|\||g<rsub|1>|\|>><wide|d<rsub|0>|^>\<cdot\><wide|g|^><rsub|0>|)>,>>|<row|<cell|<around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>>>|<cell|=>|<cell|<frac|<around*|\||g<rsub|1>|\|>|<around*|\||d<rsub|0>|\|>><around*|(|1-<frac|<around*|\||g<rsub|0>|\|>|<around*|\||g<rsub|1>|\|>><wide|g|^><rsub|0>\<cdot\><wide|g|^><rsub|1>|)>,>>|<row|<cell|<around*|\<langle\>|<wide|g|^><rsub|1>|\<rangle\>><rsup|2>>|<cell|=>|<cell|<frac|<around*|\||g<rsub|1>|\|>|d<rsub|min>><around*|(|1-<frac|<around*|\||<wide|g|~>|\|>|<around*|\||g<rsub|1>|\|>><wide|<wide|g|~>|^>\<cdot\><wide|g|^><rsub|1>|)>,<wide|x|~>=x<rsub|1>+<wide|s|~><around*|(|-<wide|g|^><rsub|1>|)>,
    <wide|s|~>\<assign\>d<rsub|min>.>>>>
  </eqnarray>

  Note that <math|<wide|G|^>> would not be positive-definite, and even
  <math|<wide|G|^>> is not necessary invertible, see the figure below.

  <\big-figure|<with|gr-mode|<tuple|edit|math-at>|gr-frame|<tuple|scale|1cm|<tuple|0.519998gw|0.629988gh>>|gr-geometry|<tuple|geometry|1par|0.6par>|gr-grid|<tuple|empty>|gr-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-edit-grid-aspect|<tuple|<tuple|axes|none>|<tuple|1|none>|<tuple|10|none>>|gr-edit-grid|<tuple|empty>|gr-edit-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-auto-crop|true|gr-arrow-end|\<gtr\>|gr-line-width|2ln|<graphics||<with|arrow-end|\<gtr\>|<line|<point|-7|0>|<point|-1.0|0.0>>>|<with|arrow-end|\<gtr\>|<spline|<point|0|0>|<point|6.0|0.0>>>|<point|-6.0|1.0>|<cspline|<point|-4.0|0.7>|<point|-2.0|0.0>|<point|-4.0|-0.7>|<point|-6.0|0.0>>|<cspline|<point|-4.0|0.5>|<point|-2.3|0.0>|<point|-4.0|-0.5>|<point|-5.6|0.0>>|<point|-5.1|0.541187>|<point|-5.411|0.2>|<with|arrow-end|\<gtr\>|dash-style|10|<line|<point|-6|-0.5>|<point|-5.411|0.2>>>|<math-at|x<rsub|2>|<point|-6.3|-0.7>>|<math-at|x<rsub|1>|<point|-6.3|1.2>>|<with|opacity|40%|fill-color|dark
  red|line-width|2ln|<carc|<point|-6|2>|<point|-5.0|1.0>|<point|-6.0|0.0>>>|<math-at|\<Delta\><rsub|1>|<point|-5.8402235745469|1.4694404021696>>|<point|4|1>|<point|-4|0>|<point|3|0>|<point|3.28586|0.3>|<spline|<point|1.0|-1.5>|<point|3.0|-0.7>|<point|5.0|-1.6>>|<spline|<point|1.0|1.7>|<point|3.0|0.8>|<point|5.0|1.7>>|<with|opacity|40%|fill-color|dark
  red|line-width|2ln|<carc|<point|4|2>|<point|5.0|1.3>|<point|4.0|0.0>>>|<line|<point|0.3|-1.4>|<point|6.0|1.6>>|<line|<point|0.3|1.5>|<point|6.0|-1.6>>|<spline|<point|6.4|1.5>|<point|4.7|0.0>|<point|6.4|-1.5>>|<point|4.84626|0.4>|<math-at|x<rsub|1>|<point|3.7|1.2>>|<math-at|x<rsub|2>|<point|5.0|0.3>>|<math-at|\<Delta\><rsub|1>|<point|4.1597764254531|1.4694404021696>>|<spline|<point|-0.0872966000793756|1.39629911363937>|<point|1.47747302779174|0.0>|<point|-0.0661297790713057|-1.22838669136129>>|<with|arrow-end|\|\<gtr\>|line-width|2ln|<line|<point|-6|1>|<point|-5.411|0.2>>>|<with|arrow-end|\|\<gtr\>|line-width|2ln|<line|<point|4|1>|<point|4.84626|0.4>>>|<with|dash-style|10|<line|<point|-6|1>|<point|-4.0|0.0>>>|<with|dash-style|10|<line|<point|4|1>|<point|3.0|0.0>>>|<with|arrow-end|\<gtr\>|dash-style|10|<line|<point|-4.69185738854346|1.18810358513031>|<point|-5.0|0.7>>>|<math-at|x<rsub|2><rprime|'>|<point|-4.50135599947083|1.3362713321868>>|<with|arrow-end|\<gtr\>|dash-style|10|<line|<point|2.61069585924064|0.531932133880143>|<point|3.28586|0.3>>>|<math-at|x<rsub|2><rprime|'>|<point|2.12161253983523|0.509298443247507>>|<with|arrow-end|\<gtr\>|<spline|<point|3.0|-1.5>|<point|3.0|3.0>>>|<with|arrow-end|\<gtr\>|<line|<point|-4.0|-1.5>|<point|-4.0|3.0>>>|<spline|<point|2|-3>|<point|0.5|-4.0>|<point|2.0|-5.0>>|<point|1.235|-3.3>|<with|opacity|40%|fill-color|dark
  red|<carc|<point|2|-3>|<point|1.0|-2.4>|<point|1.6|-4.0>>>|<spline|<point|1.6|-2.0>|<point|-0.5|-4.0>|<point|1.4|-5.6>>|<point|0.7|-2.5375>|<with|arrow-end|\<gtr\>|line-width|2ln|<line|<point|1.235|-3.3>|<point|0.7|-2.5375>>>|<math-at|x<rsub|1>|<point|1.3|-3.6>>|<math-at|x<rsub|2>|<point|0.5|-2.3>>|<math-at|<around*|\<nobracket\>|a|)>|<point|-4.3|-2.0>>|<math-at|<around*|\<nobracket\>|c|)>|<point|-2|-3.3>>|<with|arrow-end|\<gtr\>|<line|<point|-2.0|-4.0>|<point|3.0|-4.0>>>|<with|arrow-end|\<gtr\>|<line|<point|-1.0|-5.6>|<point|-1.0|-2.0>>>|<math-at|\<nu\>|<point|-3.5|3>>|<math-at|<around*|\<nobracket\>|b|)>|<point|4|-1.8>>|<math-at|\<mu\>|<point|-1|-0.5>>|<math-at|\<nu\>|<point|3.3|3>>|<math-at|\<mu\>|<point|6|-0.5>>|<math-at|\<mu\>|<point|3|-4.4>>|<math-at|\<nu\>|<point|-1.3|-2>>|<math-at|\<Delta\><rsub|1>|<point|1.3265312872073|-2.82677933589099>>|<math-at|m\<gtr\>0|<point|5.15773|1.95692>>|<math-at|m=0|<point|5.48572|1.32933>>|<math-at|m\<less\>0|<point|5.60223|0.813914>>>>>
    Three cases of <math|m<around*|(|\<mu\>,\<nu\>|)>>: a)
    <math|m<around*|(|\<mu\>,\<nu\>|)>=<frac|1|2><around*|(|\<mu\><rsup|2>+<frac|\<nu\><rsup|2>|0.1<rsup|2>>|)>>;
    b) <math|m<around*|(|\<mu\>,\<nu\>|)>=<frac|1|2><around*|(|-\<mu\><rsup|2>+<frac|\<nu\><rsup|2>|0.1<rsup|2>>|)>>;
    c) <math|m<around*|(|\<mu\>,\<nu\>|)>=\<nu\><rsup|2>-\<mu\>>. Note: the
    best solution in the region <math|\<Delta\><rsub|1>> centered by
    <math|x<rsub|1>> is <math|x<rsub|2>> not <math|x<rsub|2><rprime|'>>.
  </big-figure>

  From above figure, we know for different <math|<wide|G|^>>, the solutions
  are different, which need addressed separately.

  <\enumerate-alpha>
    <item><math|<wide|G|^>> is positive-definite

    a1) If the region is large-enough such that it contains the solution
    given in line-search-approach. (In above figure, <math|<around*|(|0,0|)>>
    is in the region)

    the best <math|d<rsub|1>> is

    <\equation*>
      d<rsub|1>=-<wide|g|^><rsub|1>\<mu\>+<wide|d|^><rsub|0>\<nu\>=<wide|s|^><around*|(|-<wide|g|^><rsub|1>+<wide|\<beta\>|^><wide|d|^><rsub|0>|)>,
    </equation*>

    where

    <\eqnarray>
      <tformat|<table|<row|<cell|<wide|\<beta\>|^>>|<cell|=>|<cell|<frac|<around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>>-<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1><around*|\<langle\>|<wide|g|^><rsub|1>|\<rangle\>><rsup|2>|<around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>-<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1><around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>>>,>>|<row|<cell|<wide|s|^>>|<cell|=>|<cell|<around*|\||g<rsub|1>|\|><frac|1-<wide|\<beta\>|^><wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1>|<around*|\<langle\>|<wide|g|^><rsub|1>|\<rangle\>><rsup|2>-2<wide|\<beta\>|^><around*|\<langle\>|<wide|d|^><rsub|0>,<wide|g|^><rsub|1>|\<rangle\>>+<wide|\<beta\>|^><rsup|2><around*|\<langle\>|<wide|d|^><rsub|0>|\<rangle\>><rsup|2>>.>>>>
    </eqnarray>

    a2) If the region is not large-enough (the case in above figure-a))

    The minimum-point is at the boundary.

    <item><math|<wide|G|^>> is negative-definite

    The minimun-point is at the boundary.

    <item><math|<wide|G|^>> is degenarate

    The minimun-point is at the boundary.

    <item>Note that for cases a2), b) and c), since the minimum-points are
    all at the boundary, the methods to solve the minimum-point are the same.
    We now address them together.

    At the boundary, so <math|<around*|\||d<rsub|1>|\|><rsup|2>=R<rsub|1><rsup|2>>,
    i.e., a constraint,

    <\equation*>
      <frac|1|2><around*|[|\<mu\>,\<nu\>|]>K<around*|[|\<mu\>,\<nu\>|]><rsup|T>-R<rsub|1><rsup|2>=0,where
      K=<matrix|<tformat|<table|<row|<cell|2>|<cell|-2<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1>>>|<row|<cell|-2<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1>>|<cell|2>>>>>.
    </equation*>

    By Lagrange's multiplier method, one can solve consider the minimum of

    <\eqnarray>
      <tformat|<table|<row|<cell|<wide|m|~><around*|(|\<mu\>,\<nu\>,\<lambda\>|)>>|<cell|=>|<cell|m<around*|(|\<mu\>,\<nu\>|)>-\<lambda\><around*|(|<frac|1|2><around*|[|\<mu\>,\<nu\>|]>K<around*|[|\<mu\>,\<nu\>|]><rsup|T>-R<rsub|1><rsup|2>|)>>>|<row|<cell|>|<cell|=>|<cell|f<rsub|1>-\<mu\><around*|\||g<rsub|1>|\|>+<wide|d|^><rsub|0>\<cdot\><wide|g|^><rsub|1>\<nu\><around*|\||g<rsub|1>|\|>+<frac|1|2><around*|[|\<mu\>,\<nu\>|]><wide|<with|font|cal|G>|^><around*|[|\<mu\>,\<nu\>|]><rsup|T>+\<lambda\>R<rsub|1><rsup|2>,>>|<row|<cell|<wide|<with|font|cal|G>|^>>|<cell|\<assign\>>|<cell|<around*|(|<wide|G|^>-\<lambda\>K|)>.>>>>
    </eqnarray>

    Note that <math|<wide|<with|font|cal|G>|^>> must be invertible(? needs
    proof), so one can solve <math|<around*|[|\<mu\>,\<nu\>|]>> as a function
    of <math|\<lambda\>>, then put the solution into the constraint, to get
    <math|\<lambda\>>, then he can get the final solution:

    xxxxxxxxx
  </enumerate-alpha>

  <subsection|Strict-inequality constrained optimization problem>

  In this subsection, we consider strict-inequality constrianed optimization
  problem, whose form is\ 

  <\eqnarray>
    <tformat|<table|<row|<cell|>|<cell|>|<cell|min
    f<around*|(|x|)>,>>|<row|<cell|s.t.>|<cell|>|<cell|c<rsub|i><around*|(|x|)>\<gtr\>0,i\<in\><with|font|cal|I>\<equiv\><around*|{|1,\<cdots\>,m|}>.<rsub|>>>>>
  </eqnarray>

  <\note>
    We assume the problem, <math|min f<around*|(|x|)>> with
    <math|c<rsub|i><around*|(|x|)>\<geqslant\>0,i\<in\><with|font|cal|I>>,
    does not take its solution point at the boundary
    <math|c<rsub|i><around*|(|x|)>=0,some i>.
  </note>

  For strict-inequality constraints, the allowed region of points is an
  <with|font-series|bold|open subset>, <math|\<Omega\>>, of
  <math|\<bbb-R\><rsup|n>>, which means these constraints are not useful at
  the minimum point. But this does not mean in an iteration process these
  constraints are not useful, since the iterative-step may too large to be in
  the allowed region. To address this problem, one can use a
  <with|font-series|bold|barrier method> or <strong|interior-point method> to
  push the points out of the allowed region. That is to say, we consider the
  problem

  <\equation*>
    min P<rsub|\<sigma\>><around*|(|x|)>\<assign\>f<around*|(|x|)>-\<sigma\><big|sum><rsub|i>ln
    c<rsub|i><around*|(|x|)>,
  </equation*>

  where <math|\<sigma\>> varies as follows:

  Consider the iteration seq <math|<around*|{|x<rsub|0>,x<rsub|1>,x<rsub|2>,\<cdots\>|}>>
  in <math|\<Omega\>>, where <math|x<rsub|k+1>=x<rsub|k>+s<rsub|k>d<rsub|k>>.
  To get <math|x<rsub|k+1>>, one can get an point
  <math|<wide|x|^><rsub|k+1>=x<rsub|k>+<wide|s|^><rsub|k>d<rsub|k>> by
  iterative process of <math|min P<rsub|\<sigma\>><around*|(|x|)>>, then

  <\enumerate-alpha>
    <item>If <math|<wide|x|^><rsub|k+1>\<in\>\<Omega\>>, then
    <math|x<rsub|k+1>=<wide|x|^><rsub|k+1>>, and choose <math|\<sigma\>=0>;

    <item>If <math|<wide|x|^><rsub|k+1>\<nin\>\<Omega\>>, then take
    <math|s<rsub|k>> smaller than <math|<wide|s|^><rsub|k>> such that
    <math|x<rsub|k+1>=x<rsub|k>+s<rsub|k>d<rsub|k>\<in\>\<Omega\>>, and
    change <math|\<sigma\>=max<around*|{|2\<sigma\>,\<sigma\><rsub|0>|}>>,
    where <math|\<sigma\><rsub|0>> is an input number.
  </enumerate-alpha>

  <\note>
    One needs to make sure that all points used in the algorithm to get
    <math|<around*|{|x<rsub|0>,x<rsub|1>,\<cdots\>|}>> are in the allowed
    region <math|\<Omega\>>, otherwise <math|c<rsub|i>\<leqslant\>0> which
    makes <math|ln c<rsub|i><around*|(|x|)>> not well-defined.
  </note>

  <subsection|Constrained optimization problem>

  A general constrained optimization problem is of the following problem,

  <\eqnarray>
    <tformat|<table|<row|<cell|>|<cell|>|<cell|min
    f<around*|(|x|)>,>>|<row|<cell|s.t.>|<cell|>|<cell|c<rsub|e><around*|(|x|)>=0,e\<in\><with|font|cal|E>\<equiv\><around*|{|1,2,\<cdots\>,m<rsub|e>|}>,>>|<row|<cell|>|<cell|>|<cell|c<rsub|i><around*|(|x|)>\<geqslant\>0,i\<in\><with|font|cal|I>\<equiv\><around*|{|m<rsub|e>+1,\<cdots\>,m-1,m|}>.<rsub|>>>>>
  </eqnarray>

  where <math|<with|font|cal|E>> is the index of constraint equations, and
  <math|<with|font|cal|I>> is the index of constraint inequations.

  Constrained optimization problem is much more complicate than unconstrained
  optimization problem.

  <\definition>
    <with|font-series|bold|Feasible set>: the set of points that satisfy the
    constraint, i.e.

    <\equation*>
      \<Omega\>\<assign\><around*|{|x<around*|\||c<rsub|i><around*|(|x|)>=0,i\<in\><with|font|cal|E>|\<nobracket\>>;c<rsub|i><around*|(|x|)>\<geqslant\>0,i\<in\><with|font|cal|I>|}>.
    </equation*>
  </definition>

  <\definition>
    <with|font-series|bold|Active set> <math|<with|font|cal|A><around*|(|x|)>>
    at feasible point <math|x>: set of indices of equality constraints and
    the indices of inequality constraints for which
    <math|c<rsub|i><around*|(|x|)>=0>, i.e.,

    <\equation*>
      <with|font|cal|A><around*|(|x|)>\<assign\><with|font|cal|E>\<cup\><around*|{|i\<in\><with|font|cal|I><around*|\||c<rsub|i><around*|(|x|)>=0|\<nobracket\>>|}>.
    </equation*>

    An inequality constraint <math|i\<in\><with|font|cal|I>> is
    <strong|active> if \ <math|c<rsub|i><around*|(|x|)>=0>, and
    <strong|inactive> if <math|c<rsub|i><around*|(|x|)>\<gtr\>0>.
  </definition>

  <subsubsection|penalty function method>

  <\definition>
    <with|font-series|bold|Penalty function>,

    <\equation*>
      P<around*|(|x,\<sigma\>|)>\<assign\>f<around*|(|x|)>+\<sigma\><around*|(|<big|sum><rsub|e=1><rsup|m<rsub|e>><around*|[|c<rsub|e><around*|(|x|)>|]><rsup|2>+<big|sum><rsub|i=m<rsub|e>+1><rsup|m><around*|[|c<rsub|i-><around*|(|x|)>|]><rsup|2>|)>,
    </equation*>

    where

    <\equation*>
      c<rsub|i-><around*|(|x|)>=min<around*|{|0,c<rsub|i><around*|(|x|)>|}>.
    </equation*>

    Introduce

    <\equation*>
      <wide|c|\<bar\>><around*|(|x|)>=<around*|[|<wide|c|\<bar\>><rsub|1><around*|(|x|)>\<nocomma\>,\<cdots\>,<wide|c|\<bar\>><rsub|m<rsub|e>><around*|(|x|)>,<wide|c|\<bar\>><rsub|m<rsub|e>+1><around*|(|x|)>,\<cdots\>,<wide|c|\<bar\>><rsub|m><around*|(|x|)>|]>,
    </equation*>

    and

    <\equation*>
      <wide|c|\<bar\>><rsub|i><around*|(|x|)>=<choice|<tformat|<table|<row|<cell|c<rsub|i><around*|(|x|)>,>|<cell|i\<in\><with|font|cal|E>,>>|<row|<cell|c<rsub|i-><around*|(|x|)>,>|<cell|i\<in\><with|font|cal|I>.>>>>>
    </equation*>

    Then the penalty function can be rewroten as

    <\equation*>
      P<around*|(|x,\<sigma\>|)>=f<around*|(|x|)>+\<sigma\><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x|)>|\<\|\|\>><rsup|2>.
    </equation*>
  </definition>

  Let <math|x<around*|(|\<sigma\>|)>> be the solution of the problem

  <\equation*>
    min<rsub|x\<in\>\<bbb-R\><rsup|n>>f<around*|(|x|)>+\<sigma\><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x|)>|\<\|\|\>><rsup|2>\<equiv\>min<rsub|x\<in\>\<bbb-R\><rsup|n>>
    P<around*|(|x,\<sigma\>|)>,
  </equation*>

  the follwoing lemmas can gives the basis of the penalty function method.

  <\lemma>
    <math|x<around*|(|\<sigma\>|)>> is also the solution of constrained
    optimization problem,

    <\equation*>
      min<rsub|x\<in\>\<bbb-R\><rsup|n>>f<around*|(|x|)>,s.t.
      <around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x|)>|\<\|\|\>>\<leqslant\>\<delta\>,
    </equation*>

    where <math|\<delta\>=<around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\>|)>|)>|\<\|\|\>>>.

    note: This means if we can let <math|\<delta\>\<rightarrow\>0>, we get
    the solution of orginal constrained optimization problem.
  </lemma>

  <\proof>
    Suppose <math|x> satisfing <math|<around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x|)>|\<\|\|\>>\<leqslant\>\<delta\>=<around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\>|)>|)>|\<\|\|\>>>,
    then by the definition of <math|x<around*|(|\<sigma\>|)>>,

    <\equation*>
      f<around*|(|x<around*|(|\<sigma\>|)>|)>+\<sigma\><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\>|)>|)>|\<\|\|\>>\<leqslant\>f<around*|(|x|)>+\<sigma\><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x|)>|\<\|\|\>>\<leqslant\>f<around*|(|x|)>+\<sigma\><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\>|)>|)>|\<\|\|\>>,
    </equation*>

    which gives the result.
  </proof>

  <\lemma>
    If <math|\<sigma\><rsub|2>\<gtr\>\<sigma\><rsub|1>\<geqslant\>0>, then

    <\equation*>
      f<around*|(|x<around*|(|\<sigma\><rsub|2>|)>|)>\<geqslant\>f<around*|(|x<around*|(|\<sigma\><rsub|1>|)>|)>,
    </equation*>

    <\equation*>
      <around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|2>|)>|)>|\<\|\|\>>\<leqslant\><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|1>|)>|)>|\<\|\|\>>.
    </equation*>

    note: This means one can use a increasing seq <math|\<sigma\><rsub|k>>,

    <\equation*>
      \<sigma\><rsub|1>\<less\>\<sigma\><rsub|2>\<less\>\<cdots\>\<less\>\<sigma\><rsub|k>\<less\>\<cdots\>,
    </equation*>

    to give a seq of <math|\<delta\><rsub|k>=<around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|k>|)>|)>|\<\|\|\>>>
    which decreases.
  </lemma>

  <\proof>
    Since <math|x<around*|(|\<sigma\><rsub|1>|)>> is the minimum of
    <math|P<around*|(|x,\<sigma\><rsub|1>|)>>,

    <\equation*>
      f<around*|(|x<around*|(|\<sigma\><rsub|1>|)>|)>+\<sigma\><rsub|1><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|1>|)>|)>|\<\|\|\>><rsup|2>\<leqslant\>f<around*|(|x<around*|(|\<sigma\><rsub|2>|)>|)>+\<sigma\><rsub|1><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|2>|)>|)>|\<\|\|\>><rsup|2>.
    </equation*>

    If the second in-eqn is true, then this in-eqn means first in-eqn is
    true, so we only need to prove second in-eqn is true.

    Since <math|x<around*|(|\<sigma\><rsub|2>|)>> is the minimum of
    <math|P<around*|(|x,\<sigma\><rsub|2>|)>>,

    <\equation*>
      f<around*|(|x<around*|(|\<sigma\><rsub|2>|)>|)>+\<sigma\><rsub|2><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|2>|)>|)>|\<\|\|\>><rsup|2>\<leqslant\>f<around*|(|x<around*|(|\<sigma\><rsub|1>|)>|)>+\<sigma\><rsub|2><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|1>|)>|)>|\<\|\|\>><rsup|2>.
    </equation*>

    Add above two in-eqns,

    <\equation*>
      <around*|(|\<sigma\><rsub|1>-\<sigma\><rsub|2>|)><around*|[|<around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|1>|)>|)>|\<\|\|\>><rsup|2>-<around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|2>|)>|)>|\<\|\|\>><rsup|2>|]>\<leqslant\>0,
    </equation*>

    which gives the second in-eqn.
  </proof>

  <\theorem>
    If <math|min<around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x|)>|\<\|\|\>>=0>,
    then a seq of <math|\<sigma\><rsub|k>> with
    <math|\<sigma\><rsub|k>\<rightarrow\>\<infty\>> can give

    <\equation*>
      <around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|k>|)>|)>|\<\|\|\>>\<rightarrow\>0.
    </equation*>
  </theorem>

  <\proof>
    Suppose it is not true, then <math|\<exists\>\<varepsilon\>\<gtr\>0>,
    s.t.,

    <\equation*>
      <around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|k>|)>|)>|\<\|\|\>>\<gtr\>\<varepsilon\>,\<forall\>k.
    </equation*>

    But since <math|min<around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x|)>|\<\|\|\>>=0>,
    there exists <math|<wide|x|^>>, s.t.,

    <\equation*>
      <around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|<wide|x|^>|)>|\<\|\|\>>\<less\>\<varepsilon\>.
    </equation*>

    By the definition of <math|x<around*|(|\<sigma\>|)>>,

    <\equation*>
      f<around*|(|<wide|x|^>|)>+\<sigma\><rsub|k><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|<wide|x|^>|)>|\<\|\|\>><rsup|2>\<geqslant\>f<around*|(|x<around*|(|\<sigma\><rsub|k>|)>|)>+\<sigma\><rsub|k><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|k>|)>|)>|\<\|\|\>><rsup|2>\<geqslant\>f<around*|(|x<around*|(|\<sigma\><rsub|1>|)>|)>+\<sigma\><rsub|k><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|k>|)>|)>|\<\|\|\>><rsup|2>,
    </equation*>

    which means

    <\equation*>
      <around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|<wide|x|^>|)>|\<\|\|\>><rsup|2>\<geqslant\><around*|\<\|\|\>|<wide|c|\<bar\>><around*|(|x<around*|(|\<sigma\><rsub|k>|)>|)>|\<\|\|\>><rsup|2>+<frac|f<around*|(|x<around*|(|\<sigma\><rsub|1>|)>|)>-f<around*|(|<wide|x|^>|)>|\<sigma\><rsub|k>>\<gtr\>\<varepsilon\><rsup|2>,as
      \<sigma\><rsub|k>\<rightarrow\>+\<infty\>.
    </equation*>
  </proof>

  <subsubsection|Augmented Lagrange penalty function method>

  Although the penalty function method is mathematically reasonable, but it
  needs to make <math|\<sigma\><rsub|k>\<rightarrow\>\<infty\>> to get the
  soultion which is not useful in practice. A method to address this problem
  is to introduce the Lagrange's multiplier <math|\<lambda\><rsub|i>>.

  To introduce the idea, let's consider equality-constrained problem,

  <\equation*>
    min f<around*|(|x|)> s.t. c<rsub|i><around*|(|x|)>=0,i=1,\<cdots\>,m.
  </equation*>

  As we know the solution, <math|x<rsup|\<ast\>>>, of the problem is also the
  fixed point of Langrange's function

  <\equation*>
    L<around*|(|x,\<lambda\>|)>=f<around*|(|x|)>-\<lambda\><rsup|T>c<around*|(|x|)>,
  </equation*>

  but generally, <math|x<rsup|\<ast\>>>, with the multiplier
  <math|\<lambda\><rsup|\<ast\>>>, is not the local minimum of
  <math|L<around*|(|x,\<lambda\>|)>>.\ 

  We can introduce, like above subsubsection, a penalty fuction,

  <\equation*>
    P<around*|(|x,\<lambda\><rsup|\<ast\>>,\<sigma\>|)>=L<around*|(|x,\<lambda\><rsup|\<ast\>>|)>+<frac|1|2>\<sigma\><around*|\<\|\|\>|c<around*|(|x|)>|\<\|\|\>><rsup|2>,
  </equation*>

  one can see that if <math|\<sigma\>> is large enough(but no
  <math|+\<infty\>>), the penalty fuction can give a seq <math|x<rsub|k>>
  which approaches to <math|x<rsup|\<ast\>>>.

  <\definition>
    <with|font-series|bold|Penalty function with Lagrange's multiplier>

    <\eqnarray>
      <tformat|<table|<row|<cell|P<around*|(|x,\<lambda\>,\<sigma\>|)>>|<cell|\<assign\>>|<cell|f<around*|(|x|)>>>|<row|<cell|>|<cell|->|<cell|<big|sum><rsub|e=1><rsup|m<rsub|e>><around*|[|\<lambda\><rsup|<around*|(|e|)>>c<rsub|e><around*|(|x|)>-<frac|1|2>\<sigma\><rsup|<around*|(|e|)>>c<rsub|e><around*|(|x|)><rsup|2>|]>>>|<row|<cell|>|<cell|->|<cell|<big|sum><rsub|i=m<rsub|e>+1><rsup|m><choice|<tformat|<table|<row|<cell|\<lambda\><rsup|<around*|(|i|)>>c<rsub|i><around*|(|x|)>-<frac|1|2>\<sigma\><rsup|<around*|(|i|)>>c<rsub|i><around*|(|x|)><rsup|2>,>|<cell|if
      c<rsup|<around*|(|i|)>>\<less\>\<lambda\><rsup|<around*|(|i|)>>/\<sigma\><rsup|<around*|(|i|)>>,>>|<row|<cell|<frac|1|2>\<lambda\><rsup|<around*|(|i|)>2>/\<sigma\><rsup|<around*|(|i|)>>,>|<cell|others.>>>>>>>>>
    </eqnarray>
  </definition>

  xxxxxx

  \;

  \;

  \;

  \;

  \;

  \;

  <section|Algebraic equations>

  <strong|Problem>: solving <math|\<b-x\>> in the eqn(s):

  <\equation*>
    \<b-f\><around*|(|\<b-x\>|)>=\<b-0\>.
  </equation*>

  <subsection|Single variable eqn>

  The algebraic eqn problem for a single varible eqn is to solve <math|x> in
  the eqn:

  <\equation*>
    f<around*|(|x|)>=0.
  </equation*>

  <strong|Iterative method>: Try to find an iterative sequence
  <math|<around*|{|x<rsub|0>,x<rsub|1>,\<cdots\>,x<rsub|n>,x<rsub|n+1>,\<cdots\>|}>>
  whose limit is just the solution: <math|x=lim<rsub|n\<rightarrow\>\<infty\>>x<rsub|n>>.
  Then <math|x<rsub|n>> can approximate the solution when <math|n> is large
  enough.

  The problem is how to find the iterative relation.

  <subsubsection|Bisection method>

  Assum <math|f<around*|(|a|)>f<around*|(|b|)>\<leqslant\>0>, then by the
  continuity of <math|f<around*|(|x|)>>, there exists an solution <math|x> in
  <math|<around*|(|a,b|)>>.

  Choose the midpoint <math|x<rsub|mid>=<frac|a+b|2>>, and we can get a
  smaller interval <math|<around*|(|a<rprime|'>,b<rprime|'>|)>> which is
  <math|<around*|(|a,x<rsub|mid>|)>> or <math|<around*|(|x<rsub|mid>,b|)>>
  still having <math|f<around*|(|a<rprime|'>|)>f<around*|(|b<rprime|'>|)>\<leqslant\>0>.

  This means an iteration, called <strong|Bisection method>:

  <\eqnarray>
    <tformat|<table|<row|<cell|i<rsub|0>>|<cell|=>|<cell|<around*|(|a,b|)>,>>|<row|<cell|i<rsub|n+1>>|<cell|=>|<cell|<choice|<tformat|<table|<row|<cell|<around*|(|left
    <around*|(|i<rsub|n>|)>,mid<around*|(|i<rsub|n>|)>|)>,>|<cell|if
    f<around*|(|left <around*|(|i<rsub|n>|)>|)>f<around*|(|mid<around*|(|i<rsub|n>|)>|)>\<leqslant\>0>>|<row|<cell|<around*|(|mid<around*|(|i<rsub|n>|)>,right<around*|(|i<rsub|n>|)>|)>,>|<cell|else>>>>>.>>>>
  </eqnarray>

  and

  <\equation*>
    x<rsub|n>=mid<around*|(|i<rsub|n>|)>.
  </equation*>

  <\scm-code>
    ;;\<less\>Scheme code\<gtr\> ------ Bisection method

    ;;; Problem: solve x in f(x)=0, in (a b) with f(a)*f(b)\<less\>0.

    ;;1. next-interval i_n =\<gtr\> i_{n+1}

    (define (next-interval interval f);a interval is a pair (cons x0 x1)

    \ \ (let* ([x0 (car interval)]

    \ \ \ \ \ \ \ \ \ [x1 (cdr interval)]

    \ \ \ \ \ \ \ \ \ [x2 (/ (+ x0 x1) 2.0)])

    \ \ \ \ (if (\<less\>= (* (f x0) (f x2)) 0)

    \ \ \ \ \ \ \ \ (cons x0 x2)

    \ \ \ \ \ \ \ \ (cons x2 x1))))

    ;;2. interval stream (i0 i1 i2..)

    (define (Bisection-stream f init-interval);i0 = init-interval

    \ \ (stream-cons init-interval

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (Bisection-stream f

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (next-interval
    init-interval f))))

    ;;3. solution-stream (x0 x1 x2 ..)

    (define (Bisection-solutions f a b)

    \ \ (stream-map (lambda (interval)

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (/ (+ (car interval) (cdr interval))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 2.0))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ (Bisection-stream f (cons a b))))
  </scm-code>

  <subsubsection|Iteration-function method>

  The iterative sequence may be defined by an iteration function <math|g> as

  <\equation*>
    x<rsub|n+1>=g<around*|(|x<rsub|n>,x<rsub|n-1>,\<cdots\>,x<rsub|n-K>|)>,K\<geqslant\>0.
  </equation*>

  The simplest case is <math|K=0> called <strong|Fixed-point method>, i.e.

  <\equation*>
    x<rsub|n+1>=g<around*|(|x<rsub|n>|)>.
  </equation*>

  Obviously, the soluion <math|x> is just the soltion of
  <math|x=g<around*|(|x|)>>, which means <math|x> is the fixed point of
  <math|g>.

  But how to find such an iteration function <math|g> from <math|f>?

  <strong|Newton method>: Newton gives one which appears quite fast,

  <\equation*>
    g<around*|(|x|)>\<assign\>x-<frac|f<around*|(|x|)>|f<rprime|'><around*|(|x|)>>.
  </equation*>

  [*Idea*: Suppose <math|x<rsub|\<ast\>><rsub|*>> is the solution, and
  <math|x> is near <math|x<rsub|\<ast\>>>, then the approximation
  <math|0=f<around*|(|x<rsub|\<ast\>>|)>\<simeq\>f<around*|(|x|)>+f<rprime|'><around*|(|x|)><around*|(|x<rsub|\<ast\>>-x|)>>
  gives <math|x<rsub|\<ast\>>\<simeq\>g<around*|(|x|)>>.]

  The defect of Newton method is the derivative of <math|f> at <math|x> is
  hard to get.

  To avoid of this, one can replace <math|f<rprime|'><around*|(|x<rsub|n>|)>\<rightarrow\><frac|f<around*|(|x<rsub|n>|)>-f<around*|(|x<rsub|n-1>|)>|x<rsub|n>-x<rsub|n-1>>>
  which gives an iterative function with <math|K=1>, called
  <strong|Newton-Sceant method>: [scean means cut]

  <\equation*>
    x<rsub|n+1>=x<rsub|n>-<around*|(|x<rsub|n>-x<rsub|n-1>|)><frac|f<around*|(|x<rsub|n>|)>|f<around*|(|x<rsub|n>|)>-f<around*|(|x<rsub|n-1>|)>>.
  </equation*>

  <\scm-code>
    ;;\<less\>code\<gtr\>------- Newton-Sceant method

    ;;next-solution

    (define (next-point x0 x1 f);(x0,x1) -\<gtr\> x2

    \ \ (- x1 (* (- x1 x0) (f x1)

    \ \ \ \ \ \ \ \ \ \ \ (/ 1.0 (- (f x1) (f x0))))))

    ;;solution stream

    (define (Newton-Secant-stream f x0 x1)

    \ \ (stream-cons x0

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (Newton-Secant-stream f x1 (next-point x0
    x1 f))))
  </scm-code>

  <subsubsection|Mixed method>

  Newton-Secant method is fast, but it may happen that the iteration would
  not converge; To save from this, one can mix Bisection method(safe) with
  Newton-Secant method(unsafe).

  Bisection method is safe since <math|x\<in\>\<cdots\>\<subset\>i<rsub|n+1>\<subset\>i<rsub|n>\<subset\>\<cdots\>\<subset\>i<rsub|0>>,
  we can modify the <math|i<rsub|n>>s whose boundary is from Newton-Secant
  method:

  <\eqnarray>
    <tformat|<table|<row|<cell|i<rsub|0>>|<cell|=>|<cell|<around*|(|a,b|)>,>>|<row|<cell|i<rsub|n+1>>|<cell|=>|<cell|<choice|<tformat|<table|<row|<cell|<around*|(|left<around*|(|i<rsub|n>|)>,x<rsub|mix>|)>,>|<cell|if
    f<around*|(|x<rsub|mix>|)>f<around*|(|left<around*|(|i<rsub|n>|)>|)>\<leqslant\>0>>|<row|<cell|<around*|(|x<rsub|mix>,right<around*|(|i<rsub|n>|)>|)>,>|<cell|else>>>>>,where>>|<row|<cell|x<rsub|mix>>|<cell|=>|<cell|<choice|<tformat|<table|<row|<cell|NewtonSecant<around*|(|f,left<around*|(|i<rsub|n>|)>,right<around*|(|i<rsub|n>|)>|)>,>|<cell|if\<in\>i<rsub|n>>>|<row|<cell|mid<around*|(|i<rsub|n>|)>,>|<cell|else>>>>>.>>>>
  </eqnarray>

  <strong|Note:> Mixed-method is fast & safe.

  <\scm-code>
    ;;\<less\>code\<gtr\>----- Mixed method

    ;;i_n =\<gtr\> i_{n+1}

    (define (next-interval/mix interval f)

    \ \ (let* ([x0 (car interval)]

    \ \ \ \ \ \ \ \ \ [x1 (cdr interval)]

    \ \ \ \ \ \ \ \ \ [x2 (let [(xns (next-point/ns x0 x1 f))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (xmid (/ (+ x0 x1) 2.0))]

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (if (and (\<gtr\> xns x0) (\<less\> xns
    x1))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ xns

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ xmid))])

    \ \ \ \ (if (\<less\>= (* (f x0) (f x2)) 0)

    \ \ \ \ \ \ \ \ (cons x0 x2)

    \ \ \ \ \ \ \ \ (cons x2 x1))))

    ;;interval stream (i0 i1 i2..)

    (define (Mixed-stream f init-interval);i0 = init-interval

    \ \ (stream-cons init-interval

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (Mixed-stream f

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (next-interval/mix
    init-interval f))))

    ;;solution stream

    (define (Mixed-solutions f a b)

    \ \ (stream-map (lambda (interval) (if (\<less\> (abs (f (car interval)))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (abs
    (f (cdr interval))))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (car
    interval)

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (cdr
    interval)))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ (Mixed-stream f (cons a b))))
  </scm-code>

  <subsubsection|Accelerating iteration: Aitken method and it's super
  version>

  Suppose we have an convergent seq <math|<around*|(|x<rsub|0>,x<rsub|1>,x<rsub|2>,\<cdots\>,x<rsub|n>,\<cdots\>|)>>,
  and the convergency is <em|linear ordered>, then we can accelerate this
  convergent seq to get a new convergent seq
  <math|<around*|(|<wide|x|~><rsub|0>,<wide|x|~><rsub|1>,<wide|x|~><rsub|2>,\<cdots\>,<wide|x|~><rsub|n>,\<cdots\>|)>>:

  <\equation*>
    <wide|x|~><rsub|n>=x<rsub|n>-<frac|<around*|(|x<rsub|n+1>-x<rsub|n>|)><rsup|2>|x<rsub|n+2>-2x<rsub|n+1>+x<rsub|n>>.
  </equation*>

  This accelerating method is called <strong|Aitken method>.

  [<em|*Idea*>: Let <math|x=lim<rsub|n\<rightarrow\>\<infty\>>x<rsub|n>>.
  Since the convergency is supposed of linear ordered, i.e.

  <\equation*>
    lim<rsub|n\<rightarrow\>\<infty\>><frac|x<rsub|n+1>-x|x<rsub|n>-x>=\<lambda\>=lim<rsub|n\<rightarrow\>\<infty\>><frac|x<rsub|n+2>-x|x<rsub|n+1>-x>,
  </equation*>

  by solving <math|<wide|x|~>> in

  <\equation*>
    <frac|x<rsub|n+1>-<wide|x|~>|x<rsub|n>-<wide|x|~>>=<frac|x<rsub|n+2>-<wide|x|~>|x<rsub|n+1>-<wide|x|~>>,
  </equation*>

  one can get the <math|<wide|x|~>=<wide|x|~><rsub|n>> given above.\ 

  Note that Aitken method is like Newton method without funcion which making
  it quite useful.]

  Further more, one can accelerate and accelerate and <text-dots> the
  accelerated seq, which gives a <strong|Super acceleration method>:

  <\equation*>
    <tabular*|<tformat|<table|<row|<cell|seq<rsup|<around*|(|0|)>>:>|<cell|>|<cell|x<rsub|0><rsup|<around*|(|0|)>>>|<cell|x<rsub|1><rsup|<around*|(|0|)>>>|<cell|x<rsub|2><rsup|<around*|(|0|)>>>|<cell|x<rsub|3><rsup|<around*|(|0|)>>>|<cell|x<rsub|4><rsup|<around*|(|0|)>>>|<cell|x<rsub|5><rsup|<around*|(|0|)>>>|<cell|x<rsub|6><rsup|<around*|(|0|)>>>|<cell|x<rsub|7><rsup|<around*|(|0|)>>>|<cell|x<rsub|8><rsup|<around*|(|0|)>>>|<cell|\<cdots\>>>|<row|<cell|seq<rsup|<around*|(|1|)>>:>|<cell|>|<cell|>|<cell|x<rsub|0><rsup|<around*|(|1|)>>>|<cell|x<rsub|1><rsup|<around*|(|1|)>>>|<cell|x<rsub|2><rsup|<around*|(|1|)>>>|<cell|x<rsub|3><rsup|<around*|(|1|)>>>|<cell|x<rsub|4><rsup|<around*|(|1|)>>>|<cell|x<rsub|5><rsup|<around*|(|1|)>>>|<cell|x<rsub|6><rsup|<around*|(|1|)>>>|<cell|x<rsub|7><rsup|<around*|(|1|)>>>|<cell|\<cdots\>>>|<row|<cell|seq<rsup|<around*|(|2|)>>:>|<cell|>|<cell|>|<cell|>|<cell|x<rsub|0><rsup|<around*|(|2|)>>>|<cell|x<rsub|1><rsup|<around*|(|2|)>>>|<cell|x<rsub|2><rsup|<around*|(|2|)>>>|<cell|x<rsub|3><rsup|<around*|(|2|)>>>|<cell|x<rsub|4><rsup|<around*|(|2|)>>>|<cell|x<rsub|5><rsup|<around*|(|2|)>>>|<cell|x<rsub|6><rsup|<around*|(|2|)>>>|<cell|\<cdots\>>>|<row|<cell|\<vdots\>>|<cell|>|<cell|>|<cell|>|<cell|>|<cell|\<ddots\>>|<cell|\<vdots\>>|<cell|\<vdots\>>|<cell|\<vdots\>>|<cell|\<vdots\>>|<cell|\<vdots\>>|<cell|>>>>>
  </equation*>

  \;

  <\scm-code>
    ;;\<less\>code\<gtr\>-- super-mixed method

    ;; stream of streams

    (define (stream-of-streams trans s)

    \ \ (stream-cons s (stream-of-streams trans (trans s))))

    ;; super accelerator

    (define (Super-accelerator-stream accelerator s)

    \ \ (stream-map stream-car (stream-of-streams accelerator s)))

    ;;^^^^^^^^^ above is general and can be used in other places

    ;; Aitken accelerator

    (define (Aitken-accelerator strm)

    \ \ (let* ([x0 (stream-ref strm 0)]

    \ \ \ \ \ \ \ \ \ [x1 (stream-ref strm 1)]

    \ \ \ \ \ \ \ \ \ [x2 (stream-ref strm 2)]

    \ \ \ \ \ \ \ \ \ [denomi (+ x0 x2 (* -2 x1))])

    \ \ \ \ (stream-cons (if (= denomi 0.0)

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x2

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (- x2 (/ (expt (- x2 x1) 2)

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ denomi)))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (Aitken-accelerator (stream-cdr
    strm)))))

    \;

    ;; The final solutions

    (define (Super-Mixed-stream f a b)

    \ \ (Super-accelerator-stream Aitken-accelerator (Mixed-solutions f a
    b)))
  </scm-code>

  <\note*>
    (1) Iteration function method is a general method to solve many problems;

    (2) Aitken acceleration method can also be used in many convergent
    secquences.
  </note*>

  <subsection|Linear eqns>

  <strong|Problem:> Solve <math|<wide|x|\<vect\>>> in the linear eqns

  <\equation*>
    A<wide|x|\<vect\>>=<wide|b|\<vect\>>,
  </equation*>

  where <math|det<around*|(|A|)>\<neq\>0>.

  <subsubsection|Direct methods>

  Direct methods can solve not too high ordered system. One direct method is
  the famous <strong|Gauss elimination method>, the other one is
  <strong|least squares(LS) method> which uses the Cholesky decomposition.
  Here we consider this method.

  Since <math|A<rsup|-1>> exists, <math|A<rsup|T>A\<equiv\>C> is symmetric
  and positive. Then by the <strong|Cholesky decomposition>, there exists a
  unique non-singular lower trangular matrix <math|L>, s.t.,
  <math|A<rsup|T>A=L L<rsup|T>>, whose the explicit expression is

  <\eqnarray>
    <tformat|<table|<row|<cell|l<rsub|i i>>|<cell|=>|<cell|<sqrt|c<rsub|i
    i>-<big|sum><rsub|k=1><rsup|i-1>l<rsub|i
    k><rsup|2>>,>>|<row|<cell|l<rsub|i j>>|<cell|=>|<cell|<around*|(|c<rsub|i
    j>-<big|sum><rsub|k=1><rsup|j-1>l<rsub|i k>l<rsub|j k>|)>/l<rsub|j
    j>,where i\<gtr\>j.>>>>
  </eqnarray>

  \;

  Thus one can solve <math|<wide|x|\<vect\>>> in two steps:\ 

  <\eqnarray>
    <tformat|<table|<row|<cell|L<wide|y|\<vect\>>>|<cell|=>|<cell|A<rsup|T>
    <wide|b|\<vect\>>,>>|<row|<cell|L<rsup|T><wide|x|\<vect\>>>|<cell|=>|<cell|<wide|y|\<vect\>>.>>>>
  </eqnarray>

  Note that there is an impoved version using <math|L D
  L<rsup|T>>-decomposion without sqrt operation.

  <subsubsection|Iterative methods>

  Here we consider iterative method, i.e. to find a matrix <math|G>, such
  that the orginal problem above is equivalant to

  <\equation>
    <wide|x|\<vect\>>=G<wide|x|\<vect\>>+<wide|h|\<vect\>><label|iteration-eqns-linear>,
  </equation>

  with <math|det<around*|(|I-G|)>\<neq\>0>.

  The advantages of iterative method are: (i) It can solve large ordered
  system; (ii) It can be solved in parallel; (ii) It can save from the
  error-accumulations compared to Gauss elimination method.

  But when would an iteration method(<reference|iteration-eqns-linear>) fail?

  <\theorem>
    The iterative seq by iteration relation(<reference|iteration-eqns-linear>)
    is convergent if and only if the spectrum of <math|G> is smaller than 1,

    <\equation*>
      \<rho\><around*|(|G|)>\<less\>1,
    </equation*>

    where <math|\<rho\><around*|(|G|)>=max<around*|{|<around*|\||\<lambda\><rsub|i>|\|>|}>>,
    <math|\<lambda\><rsub|i>>s are the eigenvalues of <math|G>.

    <\proof>
      Define the difference <math|<wide|e|\<vect\>><rsub|n+1>=<wide|x|\<vect\>><rsub|n+1>-<wide|x|\<vect\>><rsub|n>>,
      then by <math|<wide|x|\<vect\>><rsub|n+1>=G<wide|x|\<vect\>><rsub|n>+<wide|h|\<vect\>>>
      and <math|<wide|x|\<vect\>><rsub|n>=G<wide|x|\<vect\>><rsub|n-1>+<wide|h|\<vect\>>>,
      one gets the relations of difference
      <math|<wide|e|\<vect\>><rsub|n+1>=G<wide|e|\<vect\>><rsub|n>>, which
      means

      <\equation*>
        <wide|e|\<vect\>><rsub|n>=G<rsup|n><wide|e|\<vect\>><rsub|0>.
      </equation*>

      By the Cauchy convergency thm, <math|<around*|{|<wide|x|\<vect\>><rsub|n>|}>>
      converges iff <math|<wide|e|\<vect\>><rsub|n>\<rightarrow\>0>, which
      also equals to <math|G<rsup|n>\<rightarrow\>O>.

      Since any matrix can be formed by Jordan blocks, i.e.

      <\equation*>
        G=S <around*|[|J<rsub|a>|]> S<rsup|-1>,where
        J<rsub|a>=\<lambda\><rsub|a>E<rsub|a>+\<Delta\><rsub|a>,\<Delta\><rsub|a>=<around*|(|<tabular*|<tformat|<table|<row|<cell|0>|<cell|1>|<cell|>|<cell|>|<cell|>>|<row|<cell|>|<cell|0>|<cell|1>|<cell|O>|<cell|>>|<row|<cell|>|<cell|>|<cell|0>|<cell|\<ddots\>>|<cell|>>|<row|<cell|>|<cell|O>|<cell|>|<cell|0>|<cell|1>>|<row|<cell|>|<cell|>|<cell|>|<cell|>|<cell|0>>>>>|)><rsub|a\<times\>a>,
      </equation*>

      and <math|\<Delta\><rsub|a><rsup|a>=O>, we have for <math|n\<gtr\>a>,
      <math|S<rsup|-1>G<rsup|n>S=<around*|[|J<rsub|a><rsup|n>|]>>, where

      <\equation*>
        J<rsub|a><rsup|n>=\<lambda\><rsub|a><rsup|n>E<rsub|a>+C<rsub|n><rsup|1>\<lambda\><rsup|n-1><rsub|a>\<Delta\><rsub|a>+\<cdots\>+C<rsub|n><rsup|a-1>\<lambda\><rsub|a><rsup|n-a+1>\<Delta\><rsub|a><rsup|a-1>.
      </equation*>

      If all <math|<around*|\||\<lambda\><rsub|a>|\|>\<less\>1>, each term
      <math|C<rsub|n><rsup|j>\<lambda\><rsub|a><rsup|n-j>\<rightarrow\>0,as
      n\<rightarrow\>\<infty\>>, which means <math|G<rsup|n>\<rightarrow\>O>;
      On the other hand, if there exists one eignevalue
      <math|\<lambda\><rsub|a>> has <math|<around*|\||\<lambda\><rsub|a>|\|>\<geqslant\>1>,
      <math|C<rsub|n><rsup|j>\<lambda\><rsup|n-j><rsub|a>\<rightarrow\>\<infty\>,j\<geqslant\>1,as
      n\<rightarrow\>\<infty\>>, i.e. <math|<around*|(|J<rsub|a>|)><rsub|i,j\<gtr\>i>\<rightarrow\>\<infty\>>.

      \;
    </proof>
  </theorem>

  <\note>
    Could the Aitken method be used here? The answer is yes!

    Consider <math|J<rsub|a>=\<lambda\>E+\<Delta\>> with
    <math|<around*|\||\<lambda\>|\|>\<less\>1>, since each term such as
    <math|<around*|(|i,i+j|)>> element of <math|J<rsub|a><rsup|n>> converges
    as linear-ordered: <math|<frac|C<rsub|n+1><rsup|j>\<lambda\><rsup|n+1-j>-0|C<rsub|n><rsup|j>\<lambda\><rsup|n-j>-0>\<rightarrow\>\<lambda\>>.
  </note>

  The key problem is to find such appropriate <math|G> and
  <math|<wide|h|\<vect\>>>.

  The following parts are all about iterative methods.

  <subsubsection|Jacobi method>

  Assume <math|a<rsub|i i>\<neq\>0,\<forall\>i>,[<em|don't worry about this>]
  and let <math|D=diag<around*|(|a<rsub|11>,\<cdots\>,a<rsub|n n>|)>>, and
  split <math|A> into <math|A=D+Q>, then:

  <\equation>
    <wide|x|\<vect\>>=-D<rsup|-1>Q<wide|x|\<vect\>>+D<rsup|-1><wide|b|\<vect\>><label|Jacobi-iteration-func>,
  </equation>

  thus we get an iteration method called <strong|Jacobi method> with

  <\eqnarray>
    <tformat|<table|<row|<cell|G>|<cell|=>|<cell|-D<rsup|-1>Q=<around*|(|<tabular*|<tformat|<table|<row|<cell|0>|<cell|-<frac|a<rsub|12>|a<rsub|11>>>|<cell|\<cdots\>>|<cell|-<frac|a<rsub|1n>|a<rsub|11>>>>|<row|<cell|-<frac|a<rsub|21>|a<rsub|22>>>|<cell|0>|<cell|\<cdots\>>|<cell|-<frac|a<rsub|2n>|a<rsub|22>>>>|<row|<cell|\<vdots\>>|<cell|\<vdots\>>|<cell|\<ddots\>>|<cell|\<nosymbol\>\<vdots\>>>|<row|<cell|-<frac|a<rsub|n1>|a<rsub|n
    n>>>|<cell|-<frac|a<rsub|n2>|a<rsub|n
    n>>>|<cell|>|<cell|0>>>>>|)>,>>|<row|<cell|<wide|h|\<vect\>>>|<cell|=>|<cell|D<rsup|-1><wide|b|\<vect\>>=<around*|(|<frac|b<rsub|1>|a<rsub|11>>,\<cdots\>,<frac|b<rsub|n>|a<rsub|n
    n>>|)>.>>>>
  </eqnarray>

  More explicit expression (also faster and easier) is the component form:

  <\equation*>
    x<rsub|i><rsup|<around*|(|n+1|)>>=<frac|1|a<rsub|i
    i>><around*|(|b<rsub|i>-<big|sum><rsub|j\<neq\>i>a<rsub|i
    j>x<rsub|j><rsup|<around*|(|n|)>>|)>.
  </equation*>

  <\scm-code>
    ;;\<less\>code\<gtr\>-- Jacobi method

    ;; next-solution

    (define (nextstate-Jacobi A b x n);;n=dim(x)

    \ \ (vector-map

    \ \ \ (lambda (i)

    \ \ \ \ \ (/ (- (vector-ref b i)

    \ \ \ \ \ \ \ \ \ \ \ (apply + (map (lambda (j)

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (if (= j i)

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (*
    (matrix-ref A i j)

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (vector-ref
    x j))))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (range n))))

    \ \ \ \ \ \ \ \ (matrix-ref A i i)))

    \ \ \ (vector-range n)))
  </scm-code>

  <subsubsection|Gauss-Siedel method>

  Assume <math|a<rsub|i i>\<neq\>0,\<forall\>i>, and let
  <math|D=diag<around*|(|a<rsub|11>,\<cdots\>,a<rsub|n n>|)>>, then split
  <math|A> into three parts: diagonal part, lower-triangular <math|L> and
  upper-triangular <math|U>, <math|A=D+L+U>, then
  <math|<around*|(|D+L|)><wide|x|\<vect\>>=-U<wide|x|\<vect\>>+<wide|b|\<vect\>>>,
  which gives

  <\equation>
    <wide|x|\<vect\>>=-<around*|(|D+L|)><rsup|-1>U<wide|x|\<vect\>>+<around*|(|D+L|)><rsup|-1><wide|b|\<vect\>><label|Gauss-Siedel-iteration-func>,
  </equation>

  which gives

  <\eqnarray>
    <tformat|<table|<row|<cell|G>|<cell|=>|<cell|-<around*|(|D+L|)><rsup|-1>U,>>|<row|<cell|<wide|h|\<vect\>>>|<cell|=>|<cell|<around*|(|D+L|)><rsup|-1><wide|b|\<vect\>>.>>>>
  </eqnarray>

  These eqns are not hard as it looked like to be: By the iteration

  <\equation*>
    <around*|(|D+L|)><wide|x|\<vect\>><rsup|<around*|(|n+1|)>>=-U<wide|x|\<vect\>><rsup|<around*|(|n|)>>+<wide|b|\<vect\>>\<Longrightarrow\>D<wide|x|\<vect\>><rsup|<around*|(|n+1|)>>=-L<wide|x|\<vect\>><rsup|<around*|(|n+1|)>>-U<wide|x|\<vect\>><rsup|<around*|(|n|)>>+<wide|b|\<vect\>>,
  </equation*>

  since <math|L> is lower-triangular, when to solve
  <math|x<rsup|<around*|(|n+1|)>><rsub|j>>, rhs does not have
  <math|x<rsup|<around*|(|n+1|)>><rsub|l\<geqslant\>j>>.

  The explicit expresion is

  <\equation*>
    x<rsub|i><rsup|<around*|(|n+1|)>>=<frac|1|a<rsub|i
    i>><around*|(|b<rsub|i>-<big|sum><rsub|j\<less\>i>a<rsub|i
    j>x<rsub|j><rsup|<around*|(|n+1|)>>-<big|sum><rsub|j\<gtr\>i>a<rsub|i
    j>x<rsub|j><rsup|<around*|(|n|)>>|)>.
  </equation*>

  <\note>
    (i) Gauss-Siedel method is a modification of Jacobi method by
    <em|updating those <math|x<rsub|j>>s that haved been got>;

    (ii) Although simple, but Gauss-Siedel method can greatly improve the
    Jacobi method.

    (ii) The element <math|x<rsub|i>> updated in the order
    <math|x<rsub|1>,x<rsub|2>,\<cdots\>>, but it seems not necessary, so
    which order is the best? updating the larger
    <math|<around*|\||<frac|a<rsub|i j>|a<rsub|i i>>|\|>> one is better?
  </note>

  <\scm-code>
    ;;\<less\>code\<gtr\>---GaussSiedel method

    ;;next-solution

    (define (nextstate-GaussSiedel A b x n)

    \ \ (do ((newx (vector-copy x))

    \ \ \ \ \ \ \ (i 0 (+ i 1)))

    \ \ \ \ \ \ ((= i n) newx);;newx

    \ \ \ \ (vector-set! newx i

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (/ (- (vector-ref b i)

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (do ((sum 0)

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (j 0 (+ j 1)))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ((= j n) sum);;sum

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (if (not (= j i))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (set! sum (+
    sum

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (*
    (matrix-ref A i j)

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (vector-ref
    newx j);not x

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ))))))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (matrix-ref A i i)))))
  </scm-code>

  <subsubsection|Successive-Over-Relaxation(SOR) method>

  I found the Aitken method is much faster then SOR method, so this method is
  omitted.

  <subsubsection|Aitken method and its super version>

  It has been shown that the Aitken method can be used in the iterative
  method to solve linear eqns, so in this part, we give the code.

  <\scm-code>
    ;;\<less\>code\<gtr\>--- super-Jacobi/GaussSiedel method

    ;;;;;-----------Aitken acceleration--------------------

    (define (Aitken-acceleration strm)

    \ \ (let* ([s0 (stream-ref strm 0)]

    \ \ \ \ \ \ \ \ \ [s1 (stream-ref strm 1)]

    \ \ \ \ \ \ \ \ \ [s2 (stream-ref strm 2)]

    \ \ \ \ \ \ \ \ \ [news (vector-map (lambda (e0 e1 e2);;vector-map

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (if (zero? (+
    e0 e2 (* -2 e1)));;denominate is 0

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ e2

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (- e2
    (/ (expt (- e2 e1) 2)

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (+
    e0 e2 (* -2 e1))))))

    \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s0 s1 s2)])

    \ \ \ \ (stream-cons news (Aitken-acceleration (stream-cdr strm)))))

    ;;;;;-------------super acceleration--------------------

    (define (super-Jacobi-strm A b n x0)

    \ \ (super-acclerator Aitken-acceleration (Jacobi-stream-from A b n x0)))

    (define (super-GaussSiedel-strm A b n x0)

    \ \ (super-acclerator Aitken-acceleration (GaussSiedel-stream-from A b n
    x0)))

    \;
  </scm-code>

  <subsubsection|The convergency problem>

  A problem has not been answered, i.e., when do Jacobi method and
  Gauss-Siedel method work and fail. The thm1 given before only says the
  iteration matrix <math|G>, but not <math|A>.

  Here is a theorem tells when Jacobi method and Gauss-Siedel method work,

  <\theorem>
    If <math|A> is strickly diagonally dominant, i.e.,

    <\equation*>
      <big|sum><rsub|i<around*|(|\<neq\>j|)>><around*|\||<frac|a<rsub|i
      j>|a<rsub|i i>>|\|>\<less\>1,\<forall\>j,
    </equation*>

    then both Jacobi and Gauss-Siedel methods can give the unique solution.
  </theorem>

  Here is another one which in some sense solves <strong|all problems>:

  <\theorem>
    If <math|A> is symmetric and positive, then Gauss-Siedel method can give
    the unique solution.
  </theorem>

  <\note>
    Previous we said to solve <math|A<wide|x|\<vect\>>=<wide|b|\<vect\>>>, we
    can solve <math|A<rsup|T>A<wide|x|\<vect\>>=A<rsup|T><wide|b|\<vect\>>>
    instead and <math|A<rsup|T>A> is just symmentric and positive!
  </note>

  <subsection|Non-linear eqns>

  <strong|Problem:> Solve <math|<wide|x|\<vect\>>\<in\>\<bbb-R\><rsup|n>> of
  non-linar eqns

  <\equation*>
    <wide|f|\<vect\>><around*|(|<wide|x|\<vect\>>|)>=0.
  </equation*>

  <subsubsection|Newton method>

  Like one variable case, suppose <math|<wide|x|\<vect\>><rsub|\<ast\>>> is
  the (unique) solution, and <math|<wide|x|\<vect\>>> is a supposed
  approximation of <math|<wide|x|\<vect\>><rsub|\<ast\>>>, by

  <\equation*>
    0=<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|\<ast\>>|)>\<simeq\><wide|f|\<vect\>><around*|(|<wide|x|\<vect\>>|)>+<wide|f|\<vect\>><rprime|'><around*|(|<wide|x|\<vect\>>|)><around*|(|<wide|x|\<vect\>><rsub|\<ast\>>-<wide|x|\<vect\>>|)>,
  </equation*>

  where <math|<around*|(|<wide|f|\<vect\>><rprime|'><around*|(|x|)>|)><rsub|i
  j>\<assign\><around*|(|<frac|\<partial\>f<rsub|i>|\<partial\>x<rsub|j>>|)>>
  is the <em|Jacobi matrix>, i.e. <em|Frechet derivative> of
  <math|<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>>|)>>. Then one can solve
  <math|<wide|x|\<vect\>><rsub|\<ast\>>> as
  <math|<wide|x|\<vect\>><rsub|\<ast\>>=<wide|x|\<vect\>>-<wide|f|\<vect\>><rprime|'><around*|(|x|)><rsup|-1><wide|f|\<vect\>><around*|(|<wide|x|\<vect\>>|)>>,
  which gives an iterative method called <strong|Newton method>:

  <\equation>
    <wide|x|\<vect\>><rsub|k+1>=<wide|x|\<vect\>><rsub|k>-<wide|f|\<vect\>><rprime|'><around*|(|<wide|x|\<vect\>><rsub|k>|)><rsup|-1><wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)><label|Newton-method-multivars>.
  </equation>

  To avoid of solving the inverse <math|<wide|f|\<vect\>><rprime|'><rsup|-1>>,
  one can take the following two steps:

  <\eqnarray>
    <tformat|<table|<row|<cell|<wide|f|\<vect\>><rprime|'><around*|(|<wide|x|\<vect\>><rsub|k>|)><around*|(|-\<Delta\><wide|x|\<vect\>><rsub|k>|)>>|<cell|=>|<cell|<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)>,>>|<row|<cell|<wide|x|\<vect\>><rsub|k+1>>|<cell|=>|<cell|<wide|x|\<vect\>><rsub|k>-<around*|(|-\<Delta\><wide|x|\<vect\>><rsub|k>|)>.>>>>
  </eqnarray>

  <\note>
    The derivative <math|<wide|f|\<vect\>><rprime|'><rsub|i
    j>=<frac|\<partial\>f<rsub|i>|\<partial\>x<rsub|j>><around*|\||<wide|x|\<vect\>><rsub|k>|\<nobracket\>>>
    is still hard to compute, but one can use the sceant method which is
    similar to the one variable case, which is shown in the next
    subsubsection.
  </note>

  <subsubsection|Newton-Sceant method>

  \;

  We need <math|<around*|(|n+1|)>> points
  <math|<wide|\<xi\>|\<vect\>><rsub|k,0><around*|(|\<equiv\><wide|x|\<vect\>><rsub|k>|)>,<wide|\<xi\>|\<vect\>><rsub|k,1>\<cdots\>,<wide|\<xi\>|\<vect\>><rsub|k,n>>
  and the values of <math|<wide|f|\<vect\>>> at these points, such that

  <\equation*>
    <around*|(|<wide|\<xi\>|\<vect\>><rsub|k,0>,f<rsub|i><around*|(|<wide|\<xi\>|\<vect\>><rsub|k,0>|)>|)>,\<cdots\>,<around*|(|<wide|\<xi\>|\<vect\>><rsub|k,n>,f<rsub|i><around*|(|<wide|\<xi\>|\<vect\>><rsub|k,n>|)>|)>
  </equation*>

  can expand a superplane in <math|<around*|(|<wide|x|\<vect\>>,z<rsub|i>|)>>-space,
  whose expression is <math|z<rsub|i>=<wide|a|\<vect\>><rsup|i,T><wide|x|\<vect\>>+b<rsup|i>>
  and can approxmiates <math|z<rsub|i>=f<rsub|i><around*|(|<wide|x|\<vect\>>|)>>
  near <math|<around*|(|<wide|x|\<vect\>><rsub|k>,f<rsub|i><around*|(|<wide|x|\<vect\>><rsub|k>|)>|)>>
  for all <math|i=1,\<cdots\>,n>. Thus we have an approximation
  <math|<frac|\<partial\>f<rsub|i>|\<partial\>x<rsub|j>><around*|\||<wide|x|\<vect\>><rsub|k>|\<nobracket\>>\<simeq\><wide|a|\<vect\>><rsup|i><around*|[|j|]>>.

  Those <math|<wide|a|\<vect\>><rsup|i>> and <math|b<rsup|i>> satisfy

  <\equation*>
    f<rsub|i><around*|(|<wide|\<xi\>|\<vect\>><rsub|k,j>|)>=<wide|a|\<vect\>><rsup|i,T><wide|\<xi\>|\<vect\>><rsub|k,j>+b<rsup|i>,\<forall\>i=1,\<cdots\>,n;j=0,1,\<cdots\>,n.
  </equation*>

  By minusing the eqns (jth - 0th), one can get

  <\equation*>
    f<rsub|i><around*|(|<wide|\<xi\>|\<vect\>><rsub|k,j>|)>-f<rsub|i><around*|(|<wide|x|\<vect\>><rsub|k>|)>=<wide|a|\<vect\>><rsup|i,T><around*|(|<wide|\<xi\>|\<vect\>><rsub|k,j>-<wide|x|\<vect\>><rsub|k>|)>,i,j=1,\<cdots\>,n,
  </equation*>

  which is the <math|<around*|(|i,j|)>>-element of

  <\equation*>
    \<Gamma\><rsub|k>=A<rsub|k>H<rsub|k>,
  </equation*>

  where\ 

  <\eqnarray>
    <tformat|<table|<row|<cell|\<Gamma\><rsub|k>>|<cell|\<assign\>>|<cell|<around*|[|<wide|f|\<vect\>><around*|(|<wide|\<xi\>|\<vect\>><rsub|k,1>|)>-<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)>,\<ldots\>,<wide|f|\<vect\>><around*|(|<wide|\<xi\>|\<vect\>><rsub|k,n>|)>-<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)>|]>,>>|<row|<cell|A<rsub|k>>|<cell|\<assign\>>|<cell|<around*|[|<tabular*|<tformat|<table|<row|<cell|<wide|a|\<vect\>><rsup|1,T>>>|<row|<cell|\<vdots\>>>|<row|<cell|<wide|a|\<vect\>><rsup|n,T>>>>>>|]>,>>|<row|<cell|H<rsub|k>>|<cell|\<assign\>>|<cell|<around*|[|<wide|\<xi\>|\<vect\>><rsub|k,1>-<wide|x|\<vect\>><rsub|k>,\<cdots\>,<wide|\<xi\>|\<vect\>><rsub|k,n>-<wide|x|\<vect\>><rsub|k>|]>.>>>>
  </eqnarray>

  Note that <math|<around*|[|<wide|f|\<vect\>><rprime|'><around*|(|<wide|x|\<vect\>><rsub|k>|)>|]><rsub|i
  j>\<simeq\><wide|a|\<vect\>><rsup|i><around*|[|j|]>=<around*|[|A<rsub|k>|]><rsub|i
  j>>, and we get the <strong|Newton-Sceant method>:

  <\equation*>
    <wide|x|\<vect\>><rsub|k+1>=<wide|x|\<vect\>><rsub|k>-A<rsub|k><rsup|-1><wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)>=<wide|x|\<vect\>><rsub|k>-H<rsub|k>\<Gamma\><rsub|k><rsup|-1><wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)>,
  </equation*>

  whose computation procedure is

  <\equation>
    <choice|<tformat|<table|<row|<cell|<wide|x|\<vect\>><rsub|k+1>>|<cell|=>|<cell|<wide|x|\<vect\>><rsub|k>-H<rsub|k><wide|z|\<vect\>><rsub|k>>>|<row|<cell|\<Gamma\><rsub|k><wide|z|\<vect\>><rsub|k>>|<cell|=>|<cell|<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)>>>>>><label|Newton-Sceant-multivars>.
  </equation>

  Now we need to choose the <math|n> points
  <math|<around*|(|<wide|\<xi\>|\<vect\>><rsub|k,1>,\<cdots\>,<wide|\<xi\>|\<vect\>><rsub|k,n>|)>>.

  These points generally depend on <math|<wide|x|\<vect\>><rsub|k>,<wide|x|\<vect\>><rsub|k-1>,\<cdots\>,<wide|x|\<vect\>><rsub|0>>,
  where if there are <math|p> points are used, we call it
  \P<em|<math|p>-point sceant method>\Q, while if
  <math|<around*|(|<wide|x|\<vect\>><rsub|k>,\<cdots\>,<wide|x|\<vect\>><rsub|k-p+1>|)>>
  are used, we call the method \P<em|<math|p>-point sequential sceant
  method>\Q.

  Here we give \P<strong|2-point sequential sceant method>\Q:

  We only use 2-point <math|<around*|(|<wide|x|\<vect\>><rsub|k>,<wide|x|\<vect\>><rsub|k-1>|)>>
  and let

  <\equation*>
    <choice|<tformat|<table|<row|<cell|h<rsub|k,j>\<assign\><wide|x|\<vect\>><rsub|k-1><around*|[|j|]>-<wide|x|\<vect\>><rsub|k><around*|[|j|]>>>|<row|<cell|<wide|\<xi\>|\<vect\>><rsub|k,j>\<assign\><wide|x|\<vect\>><rsub|k>+h<rsub|k,j><wide|e|\<vect\>><rsub|j>>>>>>,
  </equation*>

  where <math| j=1,\<cdots\>,n> and <math|<wide|e|\<vect\>><rsub|j>=<around*|[|0,\<ldots\>,1<rsub|j>,\<ldots\>0|]><rsup|T>>.

  Thus,

  <\eqnarray>
    <tformat|<table|<row|<cell|H<rsub|k>>|<cell|=>|<cell|diag<around*|(|h<rsub|k,1>,\<cdots\>,h<rsub|k,n>|)>,>>|<row|<cell|\<Gamma\><rsub|k>>|<cell|=>|<cell|<around*|[|<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>+h<rsub|k,1><wide|e|\<vect\>><rsub|1>|)>-<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)>,\<cdots\>,<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>+h<rsub|k,n><wide|e|\<vect\>><rsub|n>|)>-<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)>|]>.>>>>
  </eqnarray>

  [See P197 in the book of Ref-1, where there is an alternative method can
  lessing some calculations.]

  Note that for Newton-Sceant method, in each step, one has to solve a
  linear-eqns, which is a large work.

  <\note>
    Althogh Newton-Sceant method gives a method to get an approximation of
    Jacobian in Newton method, it still needs to solve \ a linear-eqns in
    each step, which is a large work.

    Quasi-Newton methods can avoid the linear-eqns solving step and give the
    itetation directly, see the next subsubsetion.
  </note>

  <subsubsection|Quasi-Newton method>

  Both Newton method an Newton-Secant method, solving the eqns is to get an
  iterative relation as

  <\equation*>
    <wide|x|\<vect\>><rsub|k+1>=<wide|x|\<vect\>><rsub|k>-A<rsub|k+1><rsup|-1><wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)>\<equiv\><wide|x|\<vect\>><rsub|k>-H<rsub|k><wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)>.
  </equation*>

  In Newton method, <math|A<rsub|k+1>=<wide|f|\<vect\>><rprime|'><around*|(|<wide|x|\<vect\>><rsub|k>|)>>,
  and in Newton method, <math|A<rsub|k>> is an approximation of
  <math|<wide|f|\<vect\>><rprime|'><around*|(|<wide|x|\<vect\>><rsub|k>|)>>
  by secant.

  Let

  <\eqnarray>
    <tformat|<table|<row|<cell|\<Delta\><wide|x|\<vect\>><rsub|k>>|<cell|\<assign\>>|<cell|<wide|x|\<vect\>><rsub|k+1>-<wide|x|\<vect\>><rsub|k>,>>|<row|<cell|\<Delta\><wide|y|\<vect\>><rsub|k>>|<cell|\<assign\>>|<cell|<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k+1>|)>-<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)>,>>>>
  </eqnarray>

  and since

  <\equation>
    <wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k+1>|)>-<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)>\<simeq\><wide|f|\<vect\>><rprime|'><around*|(|<wide|x|\<vect\>><rsub|k>|)>\<Delta\><wide|x|\<vect\>><rsub|k><label|linear-approx>,
  </equation>

  one can get (let <math|A<rsub|k>=<wide|f|\<vect\>><rprime|'><around*|(|<wide|x|\<vect\>><rsub|k>|)>>
  or <math|H<rsub|k>=<wide|f|\<vect\>><rprime|'><around*|(|<wide|x|\<vect\>><rsub|k>|)><rsup|-1>>)

  <\equation*>
    \<Delta\><wide|y|\<vect\>><rsub|k>=A<rsub|k>\<Delta\><wide|x|\<vect\>><rsub|k>,
  </equation*>

  or

  <\equation>
    \<Delta\><wide|x|\<vect\>><rsub|k>=H<rsub|k>\<Delta\><wide|y|\<vect\>><rsub|k><label|iterative-eqns>.
  </equation>

  Note that what we need is to make the eqns(<reference|iterative-eqns>)
  satisfied, but <math|H<rsub|k>> is not necessary to approximate
  <math|<wide|f|\<vect\>><rprime|'><around*|(|<wide|x|\<vect\>><rsub|k>|)><rsup|-1>>
  which opens the road to find other methods.

  A <em|quasi Newton method> is to give a iterative relation of
  <math|<around*|{|A<rsub|k>|}>> or <math|<around*|{|H<rsub|k>|}>> as

  <\equation>
    H<rsub|k+1>=H<rsub|k>+E<rsub|k><label|quasi-newton>,
  </equation>

  making the iterative-eqns(<reference|iterative-eqns>) satisfied, where
  <math|E<rsub|k>> is called the <strong|correcting-matrix> . Different
  choices of <math|E<rsub|k>> define different quasi-Newton methods.

  Here we give the best known framework: <strong|BFS method>(from Broyden,
  Fletcher and Shanmo), where <math|E<rsub|k>> is a rank-2 matrix.

  Let <math|E<rsub|k>> be a rank-2 matrix,

  <\equation*>
    E<rsub|k>=U<rsub|k>V<rsub|k><rsup|T>,
  </equation*>

  where <math|U<rsub|k>,V<rsub|k>> are <math|n\<times\>2> matrix, and
  <math|U<rsub|k>=<around*|[|<wide|u|\<vect\>><rsub|k><rsup|<around*|(|1|)>>,<wide|u|\<vect\>><rsub|k><rsup|<around*|(|2|)>>|]>,V<rsub|k>=<around*|[|<wide|v|\<vect\>><rsub|k><rsup|<around*|(|1|)>>,<wide|v|\<vect\>><rsub|k><rsup|<around*|(|2|)>>|]>>,
  then

  <\equation*>
    E<rsub|k>=<wide|u|\<vect\>><rsub|k><rsup|<around*|(|1|)>><wide|v|\<vect\>><rsub|k><rsup|<around*|(|1|)>T>+<wide|u|\<vect\>><rsub|k><rsup|<around*|(|2|)>><wide|v|\<vect\>><rsub|k><rsup|<around*|(|2|)>T>.
  </equation*>

  Putting above expression into the iterative
  eqns(<reference|iterative-eqns>), one can get

  <\equation*>
    <around*|(|H<rsub|k>+<wide|u|\<vect\>><rsub|k><rsup|<around*|(|1|)>><wide|v|\<vect\>><rsub|k><rsup|<around*|(|1|)>T>+<wide|u|\<vect\>><rsub|k><rsup|<around*|(|2|)>><wide|v|\<vect\>><rsub|k><rsup|<around*|(|2|)>T>|)>\<Delta\><wide|y|\<vect\>><rsub|k>=\<Delta\><wide|x|\<vect\>><rsub|k>,
  </equation*>

  or

  <\equation*>
    <around*|(|<wide|u|\<vect\>><rsub|k><rsup|<around*|(|1|)>><wide|v|\<vect\>><rsub|k><rsup|<around*|(|1|)>T>+<wide|u|\<vect\>><rsub|k><rsup|<around*|(|2|)>><wide|v|\<vect\>><rsub|k><rsup|<around*|(|2|)>T>|)>\<Delta\><wide|y|\<vect\>><rsub|k>=\<Delta\><wide|x|\<vect\>><rsub|k>-H<rsub|k>\<Delta\><wide|y|\<vect\>><rsub|k>.
  </equation*>

  First, let <math|<wide|u|\<vect\>><rsub|k><rsup|<around*|(|1|)>>=\<Delta\><wide|x|\<vect\>><rsub|k>>
  and <math|<wide|u|\<vect\>><rsub|k><rsup|<around*|(|2|)>>=-H<rsub|k>\<Delta\><wide|y|\<vect\>><rsub|k>>,
  then

  <\equation*>
    \<Delta\><wide|x|\<vect\>><rsub|k><wide|v|\<vect\>><rsub|k><rsup|<around*|(|1|)>T>\<Delta\><wide|y|\<vect\>><rsub|k>-H<rsub|k>\<Delta\><wide|y|\<vect\>><rsub|k><wide|v|\<vect\>><rsub|k><rsup|<around*|(|2|)>T>\<Delta\><wide|y|\<vect\>><rsub|k>=\<Delta\><wide|x|\<vect\>><rsub|k>-H<rsub|k>\<Delta\><wide|y|\<vect\>><rsub|k>.
  </equation*>

  Second, if we find a <math|<wide|v|\<vect\>><rsub|k><rsup|<around*|(|1|)>>,<wide|v|\<vect\>><rsub|k><rsup|<around*|(|2|)>>>
  such that <math|<wide|v|\<vect\>><rsub|k><rsup|<around*|(|1|)>T>\<Delta\><wide|y|\<vect\>><rsub|k>=1=<wide|v|\<vect\>><rsub|k><rsup|<around*|(|2|)>T>\<Delta\><wide|y|\<vect\>><rsub|k>>,
  then the iterative-eqns(<reference|iterative-eqns>) satisfied.

  The following choice can satify iterative-eqns(<reference|iterative-eqns>):

  <\eqnarray>
    <tformat|<table|<row|<cell|<wide|v|\<vect\>><rsup|<around*|(|1|)>T>>|<cell|\<assign\>>|<cell|<around*|(|1+\<beta\>\<Delta\><wide|y|\<vect\>><rsub|k><rsup|T>H<rsub|k>\<Delta\><wide|y|\<vect\>><rsub|k>|)><frac|\<Delta\><wide|x|\<vect\>><rsub|k><rsup|T>|\<Delta\><wide|x|\<vect\>><rsub|k><rsup|T>\<Delta\><wide|y|\<vect\>><rsub|k>>-\<beta\>\<Delta\><wide|y|\<vect\>><rsub|k><rsup|T>H<rsub|k>,>>|<row|<cell|<wide|v|\<vect\>><rsup|<around*|(|2|)>T>>|<cell|\<assign\>>|<cell|<around*|(|1-\<beta\>\<Delta\><wide|x|\<vect\>><rsub|k><rsup|T>\<Delta\><wide|y|\<vect\>><rsub|k>|)><frac|\<Delta\><wide|y|\<vect\>><rsub|k><rsup|T>H<rsub|k>|\<Delta\><wide|y|\<vect\>><rsub|k><rsup|T>H<rsub|k>\<Delta\><wide|y|\<vect\>><rsub|k>>+\<beta\>\<Delta\><wide|x|\<vect\>><rsub|k><rsup|T>,>>>>
  </eqnarray>

  where <math|\<beta\>\<in\>\<bbb-R\>> is to be choosen and defines a rank-2
  method.\ 

  <strong|BFS method> chooses

  <\equation*>
    \<beta\>=<frac|1|\<Delta\><wide|x|\<vect\>><rsub|k><rsup|T>\<Delta\><wide|y|\<vect\>><rsub|k>>,
  </equation*>

  and the eqns(<reference|quasi-newton>) becomes

  <\eqnarray>
    <tformat|<table|<row|<cell|H<rsub|k+1>>|<cell|=>|<cell|H<rsub|k>+<frac|1|\<Delta\><wide|x|\<vect\>><rsub|k><rsup|T>\<Delta\><wide|y|\<vect\>><rsub|k>><around*|(|\<mu\><rsub|k>\<Delta\><wide|x|\<vect\>><rsub|k>\<Delta\><wide|x|\<vect\>><rsub|k><rsup|T>-H<rsub|k>\<Delta\><wide|y|\<vect\>><rsub|k>\<Delta\><wide|x|\<vect\>><rsub|k><rsup|T>-\<Delta\><wide|x|\<vect\>><rsub|k>\<Delta\><wide|y|\<vect\>><rsub|k><rsup|T>H<rsub|k>|)>,>>|<row|<cell|\<mu\><rsub|k>>|<cell|\<assign\>>|<cell|1+<frac|1|\<Delta\><wide|x|\<vect\>><rsub|k><rsup|T>\<Delta\><wide|y|\<vect\>><rsub|k>>\<Delta\><wide|y|\<vect\>><rsub|k><rsup|T>H<rsub|k>\<Delta\><wide|y|\<vect\>><rsub|k>.>>>>
  </eqnarray>

  By collecting these eqns with (<reference|quasi-newton>)

  <\equation*>
    <wide|x|\<vect\>><rsub|k+1>=<wide|x|\<vect\>><rsub|k>-H<rsub|k><wide|f|\<vect\>><around*|(|<wide|x|\<vect\>><rsub|k>|)>,
  </equation*>

  one get a quasi-Newton method, BFS method.

  Note that one needs to give initial values <math|<wide|x|\<vect\>><rsub|0>>
  and <math|H<rsub|0>> to start the iterations:
  <math|<around*|(|<wide|x|\<vect\>><rsub|1>,\<Delta\><wide|x|\<vect\>><rsub|0>,\<Delta\><wide|y|\<vect\>><rsub|0>,H<rsub|1>|)>\<Rightarrow\><around*|(|<wide|x|\<vect\>><rsub|2>,\<Delta\><wide|x|\<vect\>><rsub|1>,\<Delta\><wide|y|\<vect\>><rsub|1>,H<rsub|2>|)>\<Rightarrow\>\<cdots\>>.

  <\note>
    (i) Qausi-Newton methods do not solve linear-eqns in each step, so these
    methods would be much faster than Newton-Secant method.

    (ii) <math|\<beta\>=0> defines DFP(Davidon, Flecher and Powell) method,
    which is less stable than BFS method.

    (iii) Quasi-Newton methods and Newton(-Sceant) method have a disadvantage
    that it only can converge locally which means one needs to give good
    initial-points to start iterating. Lenbenberg-Marquardt method is an
    improvement of Newton(-Sceant) method that has good convergency property,
    see the next subsubsection.
  </note>

  <subsubsection|Lenvenberg-Marquardt method>

  Both Newton(-Sceant) method and quasi-Newton method face the convergence
  problem, especially when the Jacobi(or quasi Jacobi) matrix is singular.

  To avoid this problem, one can transforms solving eqns' problem into
  finding the minimun value problem of the function:

  <\equation*>
    <below|min|d\<in\>\<bbb-R\><rsup|n>><frac|1|2><around*|\<\|\|\>|<wide|f|\<vect\>><rsub|k>+J<rsub|k><wide|d|\<vect\>>|\<\|\|\>><rsup|2>,
  </equation*>

  where <math|J<rsub|k>> is <math|<wide|f|\<vect\>><rprime|'><around*|(|<wide|x|\<vect\>><rsub|k>|)>>
  in Newton method or <math|A<rsub|k>> in Newton-Sceant method. After finding
  the minimun point <math|<wide|d|\<vect\>>>, one gets the next point
  <math|<wide|x|\<vect\>><rsub|k+1>=<wide|x|\<vect\>><rsub|k>+<wide|d|\<vect\>>>.

  If <math|J<rsub|k><rsup|T>J<rsub|k>> is non-degenerate, the solution is
  <math|<wide|d|\<vect\>>=-<around*|(|J<rsub|k><rsup|T>J<rsub|k>|)><rsup|-1>J<rsub|k><rsup|T><wide|f|\<vect\>><rsub|k><around*|(|=-J<rsub|k><rsup|-1><wide|f|\<vect\>><rsub|k>|)>>,
  and the next point is

  <\equation*>
    <wide|x|\<vect\>><rsub|k+1>=<wide|x|\<vect\>><rsub|k>-<around*|(|J<rsub|k><rsup|T>J<rsub|k>|)><rsup|-1>J<rsub|k><rsup|T><wide|f|\<vect\>><rsub|k>.
  </equation*>

  But if <math|J<rsub|k><rsup|T>J<rsub|k>> is degenerate, then one can
  introduce a parameter <math|\<lambda\><rsub|k>\<gtr\>0> making
  <math|J<rsub|k><rsup|T>J<rsub|k>+\<lambda\><rsub|k>E> non-degerate, and
  consider the modified version, called <strong|Lenvenberg-Marquardt method>:

  <\equation*>
    <wide|x|\<vect\>><rsub|k+1>=<wide|x|\<vect\>><rsub|k>-<around*|(|J<rsub|k><rsup|T>J<rsub|k>+\<lambda\><rsub|k>E|)><rsup|-1>J<rsub|k><rsup|T><wide|f|\<vect\>><rsub|k>.
  </equation*>

  It has been proved(Tseng, etal) that if
  <math|<wide|f|\<vect\>><around*|(|<wide|x|\<vect\>>|)>> satisfied locally
  error-bounded, then above method can give the solution given
  <math|\<lambda\><rsub|k>=<around*|\<\|\|\>|<wide|f|\<vect\>><rsub|k>|\<\|\|\>><rsup|2>>
  (but <math|\<lambda\><rsub|k>=<around*|\<\|\|\>|<wide|f|\<vect\>><rsub|k>|\<\|\|\>>>
  seems better [see Ref-3]).

  First we give this Lenvenberg-Marquardt method where <math|J<rsub|k>>
  defined by Sceant method(see Newton-Sceant method), then we give improved
  version <em|trust region version of Lenvenberg-Marquardt method>.

  <strong|Lenvenberg-Marquardt-Sceant method:>

  In 2-point sequential sceant method(see the part of Newton-Sceant method),
  suppose we have two solutions <math|<wide|x|\<vect\>>,<wide|y|\<vect\>>>,
  the Jacobian <math|J> can be approximated by

  <\equation*>
    J\<simeq\>\<Gamma\>H<rsup|-1>,
  </equation*>

  where <math|\<Gamma\>> and <math|H> are defined in the following:

  <\eqnarray>
    <tformat|<table|<row|<cell|h<rsub|j>>|<cell|=>|<cell|<wide|x|\<vect\>><around*|[|j|]>-<wide|y|\<vect\>><around*|[|j|]>,j=1,\<cdots\>,n;>>|<row|<cell|<wide|\<xi\>|\<vect\>><rsub|j>>|<cell|=>|<cell|<wide|y|\<vect\>>+h<rsub|j><wide|e|\<vect\>><rsub|j>;>>|<row|<cell|H>|<cell|=>|<cell|<around*|[|<wide|\<xi\>|\<vect\>><rsub|1>-<wide|y|\<vect\>>,\<cdots\>,<wide|\<xi\>|\<vect\>><rsub|n>-<wide|y|\<vect\>>|]>=diag<around*|(|h<rsub|1>,\<cdots\>,h<rsub|n>|)>;>>|<row|<cell|\<Gamma\>>|<cell|=>|<cell|<around*|[|<wide|f|\<vect\>><around*|(|<wide|\<xi\>|\<vect\>><rsub|1>|)>-<wide|f|\<vect\>><around*|(|<wide|y|\<vect\>>|)>,\<cdots\>,<wide|f|\<vect\>><around*|(|<wide|\<xi\>|\<vect\>><rsub|n>|)>-<wide|f|\<vect\>><around*|(|<wide|y|\<vect\>>|)>|]>.>>>>
  </eqnarray>

  So the Lenvenberg-Marquardt method become:
  (<math|<wide|x|\<vect\>>,<wide|y|\<vect\>>\<Rightarrow\><wide|z|\<vect\>>>)

  <\eqnarray>
    <tformat|<table|<row|<cell|<wide|z|\<vect\>>>|<cell|=>|<cell|<wide|y|\<vect\>>-H<wide|\<delta\>|\<vect\>>,where>>|<row|<cell|<around*|(|\<Gamma\><rsup|T>\<Gamma\>+<around*|\<\|\|\>|<wide|f|\<vect\>><around*|(|<wide|y|\<vect\>>|)>|\<\|\|\>>H<rsup|2>|)><wide|\<delta\>|\<vect\>>>|<cell|=>|<cell|\<Gamma\><rsup|T><wide|f|\<vect\>><around*|(|*<wide|y|\<vect\>>|)>.>>>>
  </eqnarray>

  Note that <math|<around*|(|\<Gamma\><rsup|T>\<Gamma\>+<around*|\<\|\|\>|<wide|f|\<vect\>><around*|(|<wide|y|\<vect\>>|)>|\<\|\|\>>H<rsup|2>|)>>
  is symmetric-positive matrix, which means Gauss-Siedel method can be used
  directly or modified-Least-square method.

  \;

  \;

  <section|Ordinary differential eqns: initial-value problems>

  <strong|Problem:> Solve <math|<wide|x|\<vect\>><around*|(|t|)>> of odes
  with initial values

  <\eqnarray>
    <tformat|<table|<row|<cell|<frac|d<wide|x|\<vect\>>|d
    t>>|<cell|=>|<cell|<wide|f|\<vect\>><around*|(|t,<wide|x|\<vect\>>|)>,>>|<row|<cell|<wide|x|\<vect\>><around*|(|t<rsub|0>|)>>|<cell|=>|<cell|<wide|\<eta\>|\<vect\>>.>>>>
  </eqnarray>

  In numerical method, one can not get the solution as a function, but only
  has a sequence of <math|<wide|x|\<vect\>>> at some discrete times:

  <\equation*>
    <around*|(|t<rsub|0>,<wide|\<eta\>|\<vect\>>|)>,<around*|(|t<rsub|1>,<wide|x|\<vect\>><rsub|1>|)>,\<cdots\>,<around*|(|t<rsub|n>,<wide|x|\<vect\>><rsub|n>|)>,\<cdots\>.
  </equation*>

  To get this seq, it is often to find an iterative relation between
  <math|<wide|x|\<vect\>><rsub|n>> with <math|<wide|x|\<vect\>><rsub|n-1>,\<cdots\>,<wide|x|\<vect\>><rsub|n-p>>,
  where the case of <math|p=1> is called <em|one-step method>, while otheres
  are called <em|multistep method>.

  The most famous one-step method is the <strong|Runge-Kutta method>, while
  the most famous multi-sptes method is <strong|Adams method>.

  On the other hand, the iterative realtion may be explicit and implicit, so
  these methods also can be classified into <em|explicit-form method> and
  <em|implicit-form method>.

  The advanges and disadvanges of each method will be talked in the following
  parts.

  <subsection|One-step methods: Runge-Kutta method>

  A one-step method is to find a iterative relation as (let
  <math|h<rsub|n>=t<rsub|n+1>-t<rsub|n>>),\ 

  (explicit-form method)

  <\equation*>
    <wide|x|\<vect\>><rsub|n+1>=<wide|x|\<vect\>><rsub|n>+h<rsub|n><wide|\<Phi\>|\<vect\>><around*|(|t<rsub|n>,<wide|x|\<vect\>><rsub|n>,h<rsub|n>|)>,
  </equation*>

  or (implicit-form method)

  <\equation*>
    <wide|x|\<vect\>><rsub|n+1>=<wide|x|\<vect\>><rsub|n>+h<rsub|n><wide|\<Phi\>|\<vect\>><around*|(|t<rsub|n>,<wide|x|\<vect\>><rsub|n>,<wide|x|\<vect\>><rsub|n+1>,h<rsub|n>|)>.
  </equation*>

  Runge-Kutta method is one-step method which can be derived by Taylor
  expansion.

  We take the 2nd-order RK method as an example, and higher order RK method
  can be derived in the same way but is much more complicate.

  Consider

  <\eqnarray>
    <tformat|<table|<row|<cell|<wide|x|\<vect\>><around*|(|t+h|)>>|<cell|=>|<cell|<wide|x|\<vect\>><around*|(|t|)>+h<wide|<wide|x|\<vect\>>|\<dot\>><around*|(|t|)>+<frac|h<rsup|2>|2!><wide|<wide|x|\<vect\>>|\<ddot\>><around*|(|t|)>+O<around*|(|h<rsup|3>|)>>>|<row|<cell|>|<cell|=>|<cell|<wide|x|\<vect\>><around*|(|t|)>+h<wide|f|\<vect\>><around*|(|t,<wide|x|\<vect\>><around*|(|t|)>|)>+<frac|h<rsup|2>|2!><wide|<wide|f|\<vect\>>|\<dot\>><around*|(|t,<wide|x|\<vect\>><around*|(|t|)>|)>+O<around*|(|h<rsup|3>|)>>>|<row|<cell|>|<cell|=>|<cell|<wide|x|\<vect\>><around*|(|t|)>+h<wide|f|\<vect\>><around*|(|t,<wide|x|\<vect\>><around*|(|t|)>|)>+<frac|h<rsup|2>|2!><around*|(|\<partial\><rsub|t><wide|f|\<vect\>>+<wide|<wide|x|\<vect\>>|\<dot\>>\<cdot\>\<partial\><rsub|<wide|x|\<vect\>>><wide|f|\<vect\>>|)><around*|\|||\<nobracket\>><rsub|<around*|(|t,<wide|x|\<vect\>>|)>>+O<around*|(|h<rsup|3>|)>>>|<row|<cell|>|<cell|=>|<cell|<wide|x|\<vect\>><around*|(|t|)>+h<wide|f|\<vect\>><around*|(|t,<wide|x|\<vect\>><around*|(|t|)>|)>+<frac|h<rsup|2>|2!><around*|(|\<partial\><rsub|t><wide|f|\<vect\>>+<wide|f|\<vect\>>\<cdot\>\<partial\><rsub|<wide|x|\<vect\>>><wide|f|\<vect\>>|)><around*|\|||\<nobracket\>><rsub|<around*|(|t,<wide|x|\<vect\>>|)>>+O<around*|(|h<rsup|3>|)>>>>>
  </eqnarray>

  this gives the iterative-function to 2nd-order:

  <\equation*>
    <wide|\<Phi\>|\<vect\>><around*|(|t,<wide|x|\<vect\>>,h|)>=<wide|f|\<vect\>>+<frac|h|2><around*|(|\<partial\><rsub|t><wide|f|\<vect\>>+<wide|f|\<vect\>>\<cdot\>\<partial\><rsub|<wide|x|\<vect\>>><wide|f|\<vect\>>|)>.
  </equation*>

  The key point is that we can use some points that can replace the
  derivative, i.e., suppose that

  <\equation*>
    <wide|\<Phi\>|\<vect\>><around*|(|t,<wide|x|\<vect\>>,h|)>=c<rsub|1><wide|K|\<vect\>><rsub|1>+c<rsub|2><wide|K|\<vect\>><rsub|2>,
  </equation*>

  where

  <\eqnarray>
    <tformat|<table|<row|<cell|<wide|K|\<vect\>><rsub|1>>|<cell|=>|<cell|<wide|f|\<vect\>><around*|(|t,<wide|x|\<vect\>>|)>,>>|<row|<cell|<wide|K|\<vect\>><rsub|2>>|<cell|=>|<cell|<wide|f|\<vect\>><around*|(|t+a<rsub|2>h,<wide|x|\<vect\>>+b<rsub|21>h
    <wide|K|\<vect\>><rsub|1>|)>>>|<row|<cell|>|<cell|=>|<cell|<wide|K|\<vect\>><rsub|1>+a<rsub|2>h\<partial\><rsub|t><wide|f|\<vect\>><around*|\||<rsub|<around*|(|t,<wide|x|\<vect\>>|)>>|\<nobracket\>>+b<rsub|21>h<wide|K|\<vect\>><rsub|1>\<cdot\>\<partial\><rsub|<wide|x|\<vect\>>><wide|f|\<vect\>><around*|\||<rsub|<around*|(|t,<wide|x|\<vect\>>|)>>|\<nobracket\>>+O<around*|(|h<rsup|2>|)>>>|<row|<cell|>|<cell|=>|<cell|<wide|K|\<vect\>><rsub|1>+a<rsub|2>h\<partial\><rsub|t><wide|f|\<vect\>><around*|\||<rsub|<around*|(|t,<wide|x|\<vect\>>|)>>|\<nobracket\>>+b<rsub|21>h<around*|(|<wide|f|\<vect\>>\<cdot\>\<partial\><rsub|<wide|x|\<vect\>>><wide|f|\<vect\>>|)><around*|\||<rsub|<around*|(|t,<wide|x|\<vect\>>|)>>|\<nobracket\>>+O<around*|(|h<rsup|2>|)>>>>>
  </eqnarray>

  So it requires

  <\equation*>
    <choice|<tformat|<table|<row|<cell|c<rsub|1>+c<rsub|2>=1,>>|<row|<cell|c<rsub|2>a<rsub|2>=<frac|1|2>,>>|<row|<cell|c<rsub|2>b<rsub|21>=<frac|1|2>.>>>>>
  </equation*>

  There are many choices(4 variables with 3 eqns) of <math|c,a,b>s to satisfy
  these requirments:

  (i) <math|c<rsub|1>=c<rsub|2>=<frac|1|2>,a<rsub|2>=b<rsub|21>=1> gives
  improved-Euler method;

  (ii) <math|c<rsub|1>=0,c<rsub|2>=1,a<rsub|2>=b<rsub|21>=<frac|1|2>> gives
  midpoint method;

  (iii) <math|c<rsub|1>=<frac|1|4>,c<rsub|2>=<frac|3|4>,a<rsub|2>=b<rsub|21>=<frac|2|3>>
  gives Heun method, and so on.

  Here we give the <strong|explicit 4-level 4th-order Runge-Kutta method>
  which is often used:

  <\eqnarray>
    <tformat|<table|<row|<cell|<wide|x|\<vect\>><rsub|n+1>>|<cell|=>|<cell|<wide|x|\<vect\>><rsub|n>+<frac|h|6><around*|(|<wide|K|\<vect\>><rsub|1>+2<wide|K|\<vect\>><rsub|2>+2<wide|K|\<vect\>><rsub|3>+<wide|K|\<vect\>><rsub|4>|)>,>>|<row|<cell|<wide|K|\<vect\>><rsub|1>>|<cell|=>|<cell|<wide|f|\<vect\>><around*|(|t<rsub|n>,<wide|x|\<vect\>><rsub|n>|)>,>>|<row|<cell|<wide|K|\<vect\>><rsub|2>>|<cell|=>|<cell|<wide|f|\<vect\>><around*|(|t<rsub|n>+<frac|1|2>h,<wide|x|\<vect\>><rsub|n>+<frac|1|2>h<wide|K|\<vect\>><rsub|1>|)>,>>|<row|<cell|<wide|K|\<vect\>><rsub|3>>|<cell|=>|<cell|<wide|f|\<vect\>><around*|(|t<rsub|n>+<frac|1|2>h,<wide|x|\<vect\>><rsub|n>+<frac|1|2>h<wide|K|\<vect\>><rsub|2>|)>,>>|<row|<cell|<wide|K|\<vect\>><rsub|4>>|<cell|=>|<cell|<wide|f|\<vect\>><around*|(|t<rsub|n>+h,<wide|x|\<vect\>><rsub|n>+h<wide|K|\<vect\>><rsub|3>|)>.>>>>
  </eqnarray>

  [4-level means there are 4 <math|<wide|K|\<vect\>><rsub|i>>s are used,
  4th-order means the error is of <math|o<around*|(|h<rsup|4>|)>>.]

  Note that the step <math|h> above is given to be constant; but we know that
  the steps can be choosen as one likes.

  Here we give an adapted-steps Runge-Kutta method, called
  <strong|Rugge-Kutta-Fehlberg mtheod>, which is used widely:

  Fehlberg(1970) found that there is a choice of 4th-order RK method shares
  <math|K<rsub|i>>s with 5th-order RK method, then he can use 5th-order RK
  solution to control 4th-order RK soltuion by adapting the steps without
  needing much more <math|K<rsub|i>>s. The result is as follows:

  **<strong|RKF-method>**

  4th order rk:

  <\equation*>
    <wide|x|\<vect\>><rsub|n+1>=<wide|x|\<vect\>><rsub|n>+<frac|25|216><wide|K|\<vect\>><rsub|1>+<frac|1408|2565><wide|K|\<vect\>><rsub|3>+<frac|2197|4104><wide|K|\<vect\>><rsub|4>-<frac|1|5><wide|K|\<vect\>><rsub|5>,
  </equation*>

  5th order rk:

  <\equation*>
    <wide|\<xi\>|\<vect\>><rsub|n+1>=<wide|x|\<vect\>><rsub|n>+<frac|16|135><wide|K|\<vect\>><rsub|1>+<frac|6656|12825><wide|K|\<vect\>><rsub|3>+<frac|28561|56430><wide|K|\<vect\>><rsub|4>-<frac|9|50><wide|K|\<vect\>><rsub|5>+<frac|2|55><wide|K|\<vect\>><rsub|6>,
  </equation*>

  where

  <\eqnarray>
    <tformat|<table|<row|<cell|<wide|K|\<vect\>><rsub|1>>|<cell|=>|<cell|h<wide|f|\<vect\>><around*|(|t<rsub|n>,<wide|x|\<vect\>><rsub|n>|)>,>>|<row|<cell|<wide|K|\<vect\>><rsub|2>>|<cell|=>|<cell|h<wide|f|\<vect\>><around*|(|t<rsub|n>+<frac|h|4>,<wide|x|\<vect\>><rsub|n>+<frac|1|4><wide|K|\<vect\>><rsub|1>|)>,>>|<row|<cell|<wide|K|\<vect\>><rsub|3>>|<cell|=>|<cell|h<wide|f|\<vect\>><around*|(|t<rsub|n>+<frac|3|8>h,<wide|x|\<vect\>><rsub|n>+<frac|3|32><wide|K|\<vect\>><rsub|1>+<frac|9|32><wide|K|\<vect\>><rsub|2>|)>,>>|<row|<cell|<wide|K|\<vect\>><rsub|4>>|<cell|=>|<cell|h<wide|f|\<vect\>><around*|(|t<rsub|n>+<frac|12|13>h,<wide|x|\<vect\>><rsub|n>+<frac|1932|2197><wide|K|\<vect\>><rsub|1>-<frac|7200|2197><wide|K|\<vect\>><rsub|2>+<frac|7296|2197><wide|K|\<vect\>><rsub|3>|)>,>>|<row|<cell|<wide|K|\<vect\>><rsub|5>>|<cell|=>|<cell|h<wide|f|\<vect\>><around*|(|t<rsub|n>+h,<wide|x|\<vect\>><rsub|n>+<frac|439|216><wide|K|\<vect\>><rsub|1>-8<wide|K|\<vect\>><rsub|2>+<frac|3680|513><wide|K|\<vect\>><rsub|3>-<frac|845|4104><wide|K|\<vect\>><rsub|4>|)>,>>|<row|<cell|<wide|K|\<vect\>><rsub|6>>|<cell|=>|<cell|h<wide|f|\<vect\>><around*|(|t<rsub|n>+<frac|h|2>,<wide|x|\<vect\>><rsub|n>-<frac|8|27><wide|K|\<vect\>><rsub|1>+2<wide|K|\<vect\>><rsub|2>-<frac|3544|2565><wide|K|\<vect\>><rsub|3>-<frac|1859|4104><wide|K|\<vect\>><rsub|4>-<frac|11|40><wide|K|\<vect\>><rsub|5>|)>.>>>>
  </eqnarray>

  The error <math|\<tau\><rsub|n+1>=<frac|<around*|\||<wide|x|\<vect\>><around*|(|t<rsub|n+1>|)>-<wide|x|\<vect\>><rsub|n+1>|\|>|h>\<sim\>\<lambda\>h<rsup|4>>
  can be approximated by <math|<around*|\||<wide|\<xi\>|\<vect\>><rsub|n+1>-<wide|x|\<vect\>><rsub|n+1>|\|>/h>,
  so when error larger than <math|TOL>, i.e.
  <math|<frac|<around*|\||<wide|\<xi\>|\<vect\>><rsub|n+1>-<wide|x|\<vect\>><rsub|n+1>|\|>|h>\<gtr\>TOL>,
  we can take a smaller step <math|<wide|h|~>\<equiv\>q h>, with
  <math|q=<around*|[|<frac|TOL|<around*|\||<wide|\<xi\>|\<vect\>><rsub|n+1>-<wide|x|\<vect\>><rsub|n+1>|\|>/h>|]><rsup|1/4>>,
  which in practice, is taken as

  <\equation*>
    q=<around*|[|<frac|TOL|2<around*|\||<wide|\<xi\>|\<vect\>><rsub|n+1>-<wide|x|\<vect\>><rsub|n+1>|\|>/h>|]><rsup|1/4>\<simeq\>0.84<around*|[|<frac|TOL|R>|]><rsup|1/4>,
  </equation*>

  where <math|R> is the current error

  <\equation*>
    R\<equiv\><frac|<around*|\||<wide|\<xi\>|\<vect\>><rsub|n+1>-<wide|x|\<vect\>><rsub|n+1>|\|>|h>=<frac|1|h><around*|\||<frac|1|360><wide|K|\<vect\>><rsub|1>-<frac|128|4275><wide|K|\<vect\>><rsub|3>-<frac|2197|75240><wide|K|\<vect\>><rsub|4>+<frac|1|50><wide|K|\<vect\>><rsub|5>+<frac|2|55><wide|K|\<vect\>><rsub|6>|\|>.
  </equation*>

  **<strong|implicit-RK-method>**

  Implicit one-step method is that the iterative-fun
  <math|<wide|\<Phi\>|\<vect\>>> is also a function of
  <math|<wide|x|\<vect\>><rsub|n+1>>, here we only give one method of
  RK-method, 1-level 2th-order method:

  <\eqnarray>
    <tformat|<table|<row|<cell|<wide|x|\<vect\>><rsub|n+1>>|<cell|=>|<cell|<wide|x|\<vect\>><rsub|n>+<wide|K|\<vect\>><rsub|1>,>>|<row|<cell|<wide|K|\<vect\>><rsub|1>>|<cell|=>|<cell|h<wide|f|\<vect\>><around*|(|t<rsub|n>+<frac|h|2>,<wide|x|\<vect\>><rsub|n>+<frac|1|2><wide|K|\<vect\>><rsub|1>|)>.>>>>
  </eqnarray>

  Note that to get <math|<wide|x|\<vect\>><rsub|n+1>>, one needs to solve
  <math|<wide|K|\<vect\>><rsub|1>> in the second eqns which is a non-linear
  eqns. Implicit methods are used to solve some special eqns, like stiff
  odes.

  <subsection|Multi-steps methods: Adams methods>

  xxxxxxxx

  \;

  <section|Monte Carlo methods>

  Monte Carlo methods are a large class of methods which use statiscal method
  to address numerical mathematical problem. They are shown in a separate
  note, see <verbatim|./monte-carlo-methods/monte-carlo-methods.tm(pdf)>.

  \;
</body>

<\initial>
  <\collection>
    <associate|page-medium|papyrus>
  </collection>
</initial>

<\references>
  <\collection>
    <associate|Gauss-Siedel-iteration-func|<tuple|13|23>>
    <associate|Jacobi-iteration-func|<tuple|12|23>>
    <associate|Newton|<tuple|9|5>>
    <associate|Newton-Sceant-multivars|<tuple|15|26>>
    <associate|Newton-method-multivars|<tuple|14|25>>
    <associate|SD|<tuple|8|5>>
    <associate|armijo-condition|<tuple|4|2>>
    <associate|auto-1|<tuple|1|2>>
    <associate|auto-10|<tuple|4|10>>
    <associate|auto-11|<tuple|2.3|10>>
    <associate|auto-12|<tuple|2.3.1|12>>
    <associate|auto-13|<tuple|2.3.2|12>>
    <associate|auto-14|<tuple|5|13>>
    <associate|auto-15|<tuple|2.4|14>>
    <associate|auto-16|<tuple|2.5|15>>
    <associate|auto-17|<tuple|2.5.1|15>>
    <associate|auto-18|<tuple|2.5.2|17>>
    <associate|auto-19|<tuple|3|18>>
    <associate|auto-2|<tuple|2|2>>
    <associate|auto-20|<tuple|3.1|18>>
    <associate|auto-21|<tuple|3.1.1|18>>
    <associate|auto-22|<tuple|3.1.2|19>>
    <associate|auto-23|<tuple|3.1.3|19>>
    <associate|auto-24|<tuple|3.1.4|20>>
    <associate|auto-25|<tuple|3.2|21>>
    <associate|auto-26|<tuple|3.2.1|21>>
    <associate|auto-27|<tuple|3.2.2|22>>
    <associate|auto-28|<tuple|3.2.3|23>>
    <associate|auto-29|<tuple|3.2.4|23>>
    <associate|auto-3|<tuple|2.1|2>>
    <associate|auto-30|<tuple|3.2.5|24>>
    <associate|auto-31|<tuple|3.2.6|24>>
    <associate|auto-32|<tuple|3.2.7|25>>
    <associate|auto-33|<tuple|3.3|25>>
    <associate|auto-34|<tuple|3.3.1|25>>
    <associate|auto-35|<tuple|3.3.2|26>>
    <associate|auto-36|<tuple|3.3.3|27>>
    <associate|auto-37|<tuple|3.3.4|29>>
    <associate|auto-38|<tuple|4|30>>
    <associate|auto-39|<tuple|4.1|30>>
    <associate|auto-4|<tuple|2.2|2>>
    <associate|auto-40|<tuple|4.2|32>>
    <associate|auto-41|<tuple|5|32>>
    <associate|auto-5|<tuple|2.2.1|2>>
    <associate|auto-6|<tuple|1|3>>
    <associate|auto-7|<tuple|2.2.2|5>>
    <associate|auto-8|<tuple|2|6>>
    <associate|auto-9|<tuple|3|9>>
    <associate|cc|<tuple|7|3>>
    <associate|curvature-condition|<tuple|5|2>>
    <associate|descent-condtion|<tuple|3|2>>
    <associate|iteration-eqns-linear|<tuple|11|22>>
    <associate|iteration-relation-of-optimization|<tuple|2|2>>
    <associate|iterative-eqns|<tuple|17|27>>
    <associate|linear-approx|<tuple|16|27>>
    <associate|optimization-problem|<tuple|1|2>>
    <associate|quasi-newton|<tuple|18|27>>
    <associate|sdc|<tuple|6|3>>
    <associate|solve-distance|<tuple|10|11>>
  </collection>
</references>

<\auxiliary>
  <\collection>
    <\associate|figure>
      <tuple|normal|<\surround|<hidden-binding|<tuple>|1>|>
        Region for <with|mode|<quote|math>|s<rsub|i+1>>: sdc(light-green) and
        cc(deep-green).
      </surround>|<pageref|auto-6>>

      <tuple|normal|<\surround|<hidden-binding|<tuple>|2>|>
        Subspace method in line-search algorithm [note that
        <with|mode|<quote|math>|<frac|\<partial\>f|\<partial\>d<rsub|0>><around*|\||<rsub|x<rsub|1>>|\<nobracket\>>=<with|font-series|<quote|bold>|d><rsub|0>\<cdot\><with|font-series|<quote|bold>|g><rsub|1>\<sim\>0>]
      </surround>|<pageref|auto-8>>

      <tuple|normal|<\surround|<hidden-binding|<tuple>|3>|>
        \;
      </surround>|<pageref|auto-9>>

      <tuple|normal|<\surround|<hidden-binding|<tuple>|4>|>
        <with|mode|<quote|math>|<around*|\<\|\|\>|x<rsub|<wide|t|^>>-x<rsub|2>|\<\|\|\>>=\<delta\><around*|\<\|\|\>|\<Delta\>x|\<\|\|\>>>.
      </surround>|<pageref|auto-10>>

      <tuple|normal|<\surround|<hidden-binding|<tuple>|5>|>
        Three cases of <with|mode|<quote|math>|m<around*|(|\<mu\>,\<nu\>|)>>:
        a) <with|mode|<quote|math>|m<around*|(|\<mu\>,\<nu\>|)>=<frac|1|2><around*|(|\<mu\><rsup|2>+<frac|\<nu\><rsup|2>|0.1<rsup|2>>|)>>;
        b) <with|mode|<quote|math>|m<around*|(|\<mu\>,\<nu\>|)>=<frac|1|2><around*|(|-\<mu\><rsup|2>+<frac|\<nu\><rsup|2>|0.1<rsup|2>>|)>>;
        c) <with|mode|<quote|math>|m<around*|(|\<mu\>,\<nu\>|)>=\<nu\><rsup|2>-\<mu\>>.
        Note: the best solution in the region
        <with|mode|<quote|math>|\<Delta\><rsub|1>> centered by
        <with|mode|<quote|math>|x<rsub|1>> is
        <with|mode|<quote|math>|x<rsub|2>> not
        <with|mode|<quote|math>|x<rsub|2><rprime|'>>.
      </surround>|<pageref|auto-14>>
    </associate>
    <\associate|toc>
      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|1<space|2spc>Introduction>
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-1><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|2<space|2spc>Optimization
      problems> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-2><vspace|0.5fn>

      <with|par-left|<quote|1tab>|2.1<space|2spc>Unconstrained optimization
      problem <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-3>>

      <with|par-left|<quote|1tab>|2.2<space|2spc>Line search approach
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-4>>

      <with|par-left|<quote|2tab>|2.2.1<space|2spc>Line search algorithm
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-5>>

      <with|par-left|<quote|2tab>|2.2.2<space|2spc>Different methods to
      descent direction <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-7>>

      <with|par-left|<quote|1tab>|2.3<space|2spc>Trust region approach
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-11>>

      <with|par-left|<quote|2tab>|2.3.1<space|2spc>Newton, Quasi-Newton, LMF
      and CG methods <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-12>>

      <with|par-left|<quote|2tab>|2.3.2<space|2spc>Subspace method
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-13>>

      <with|par-left|<quote|1tab>|2.4<space|2spc>Strict-inequality
      constrained optimization problem <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-15>>

      <with|par-left|<quote|1tab>|2.5<space|2spc>Constrained optimization
      problem <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-16>>

      <with|par-left|<quote|2tab>|2.5.1<space|2spc>penalty function method
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-17>>

      <with|par-left|<quote|2tab>|2.5.2<space|2spc>Augmented Lagrange penalty
      function method <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-18>>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|3<space|2spc>Algebraic
      equations> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-19><vspace|0.5fn>

      <with|par-left|<quote|1tab>|3.1<space|2spc>Single variable eqn
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-20>>

      <with|par-left|<quote|2tab>|3.1.1<space|2spc>Bisection method
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-21>>

      <with|par-left|<quote|2tab>|3.1.2<space|2spc>Iteration-function method
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-22>>

      <with|par-left|<quote|2tab>|3.1.3<space|2spc>Mixed method
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-23>>

      <with|par-left|<quote|2tab>|3.1.4<space|2spc>Accelerating iteration:
      Aitken method and it's super version
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-24>>

      <with|par-left|<quote|1tab>|3.2<space|2spc>Linear eqns
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-25>>

      <with|par-left|<quote|2tab>|3.2.1<space|2spc>Direct methods
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-26>>

      <with|par-left|<quote|2tab>|3.2.2<space|2spc>Iterative methods
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-27>>

      <with|par-left|<quote|2tab>|3.2.3<space|2spc>Jacobi method
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-28>>

      <with|par-left|<quote|2tab>|3.2.4<space|2spc>Gauss-Siedel method
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-29>>

      <with|par-left|<quote|2tab>|3.2.5<space|2spc>Successive-Over-Relaxation(SOR)
      method <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-30>>

      <with|par-left|<quote|2tab>|3.2.6<space|2spc>Aitken method and its
      super version <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-31>>

      <with|par-left|<quote|2tab>|3.2.7<space|2spc>The convergency problem
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-32>>

      <with|par-left|<quote|1tab>|3.3<space|2spc>Non-linear eqns
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-33>>

      <with|par-left|<quote|2tab>|3.3.1<space|2spc>Newton method
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-34>>

      <with|par-left|<quote|2tab>|3.3.2<space|2spc>Newton-Sceant method
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-35>>

      <with|par-left|<quote|2tab>|3.3.3<space|2spc>Quasi-Newton method
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-36>>

      <with|par-left|<quote|2tab>|3.3.4<space|2spc>Lenvenberg-Marquardt
      method <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-37>>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|4<space|2spc>Ordinary
      differential eqns: initial-value problems>
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-38><vspace|0.5fn>

      <with|par-left|<quote|1tab>|4.1<space|2spc>One-step methods:
      Runge-Kutta method <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-39>>

      <with|par-left|<quote|1tab>|4.2<space|2spc>Multi-steps methods: Adams
      methods <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-40>>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|5<space|2spc>Monte
      Carlo methods> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-41><vspace|0.5fn>
    </associate>
  </collection>
</auxiliary>